Overview
Welcome to Arctic, Arctic is a streaming lake warehouse system open sourced by NetEase. Arctic adds more real-time capabilities on top of Iceberg and Hive, and provides stream-batch unified, out-of-the-box metadata services for DataOps, allowing Data Lakes much more usable and practical.
Introduction
Arctic is a streaming lakehouse service built on top of apache Iceberg table format.
Through Arctic, users could benefit optimized CDC、streaming update、fresh olap etc. on engines like Flink, Spark, and Trino.
Combined with efficient offline processing capabilities of data lakes, Arctic can serve more scenarios where streaming and batch are fused.
At the same time, the function of self-optimization、concurrent conflict resolution and standard management tools could effectively reduce the burden on users in data lake management and optimization.
Arctic service is presented by deploying AMS, which can be considered as a replacement for HMS (Hive Metastore), or HMS for Iceberg.
Arctic uses Iceberg as the base table format, but instead of hacking the Iceberg implementation, it uses Iceberg as a lib.
Arctic's open overlay architecture can help large-scale offline data lakes quickly upgraded to real-time data lakes, without worrying about compatibility issues with the original data lakes,
allowing data lakes to meet more real-time analysis, real-time risk control, Real-time training, feature engineering and other scenarios.
Arctic features
- Efficient streaming updates based on primary keys
- Data auto bucketing and self-optimized for performance and efficiency
- Encapsulating data lake and message queue into a unified table to achieve lower-latency computing
- Provide standardized metrics, dashboard and related management tools for streaming LakeHouse
- Support Spark and Flink to read and write data, support Trino to query data
- 100% compatible with Iceberg / Hive table format and syntax
- Provide transactional guarantee for streaming and batch concurrent writing