Utilizing Delta tables and Databricks SQL Warehouse for cost-effective, high-scale query capacity
In a bid to manage the growing volumes of blockchain data and provide fast, flexible analytics while keeping cloud costs under control, a team has successfully transitioned their data architecture. The new approach, a dual-layer design, leverages Structured Streaming, Delta Lake, and Databricks SQL Warehouse to process and analyze blockchain data efficiently.
At the heart of this transformation is Structured Streaming, a component that processes real-time data streams, handling the high-volume and high-velocity nature of blockchain transactions. Structured Streaming reads data from multiple real-time data sources, such as Kinesis streams, and organizes it into structured datasets, ensuring it's ready for further analysis.
Data processed by Structured Streaming is stored in Delta Lake tables, a storage solution that supports both batch and streaming data, offering ACID transactions and schema evolution. This is crucial for handling blockchain data, which often requires frequent updates and variable schema. Delta Lake's versioning capabilities allow for easy data management and querying at specific points in time.
The SQL Warehouse, powered by Databricks' Photon engine, provides fast and efficient SQL querying capabilities on top of Delta tables. This allows for scalable and cost-effective querying of blockchain data. The SQL Warehouse supports serverless architecture, automatically scaling to match query demands, thereby reducing costs.
This dual-layer design has brought about significant benefits. Scalability is ensured, as the architecture can handle large volumes of blockchain data efficiently. Cost reduction is achieved, as resources are only used when queries are executed. Flexibility is provided, as the use of Delta tables and Databricks SQL Warehouse offers flexible querying capabilities, enabling support for various analytics use cases without the need for additional indexing.
Key components of this transformation include Deletion Vectors, Liquid Clustering, and optimized Delta Table layouts. Deletion Vectors, a recent Delta Lake feature, improves write-heavy workflows by allowing for metadata-aware filtering, lazy clean-up, and merge-on-read. Liquid Clustering clusters data based on query columns without the rigid structure of traditional folder-based partitioning, improving point lookup performance for filters heavily reliant on specific columns. It also handles scale, reduces layout skew, and stays performant without requiring full-table rewrites. The Delta Table layout was optimized using a two-pronged approach: Hive-style partitioning and Z-Ordering on the usdValue field.
The read side of the new architecture needed to support fast lookups at massive scale, handling high cardinality, data skew, and a broad spectrum of access patterns. Traditional storage and query solutions struggled to keep up with the growth in performance and cost. Liquid Clustering, smartly designed for query patterns, facilitates clustering based on query usage, ensuring consistent performance with newly written data instantly clustered, reducing skew, and improving Merge job performance.
This transition from a low-latency, index-heavy AWS DynamoDB approach to the dual-layer design has resulted in a 43% reduction in cost per million processed transfers while improving performance and future-proofing the architecture. The brand name's on-chain data processing volume tripled in 18 months, resulting in over three billion blockchain transactions per month. This transformation has unlocked scalable querying of structured blockchain data, providing flexibility for both customers and internal developers.
In this transformation, the technological advancement, Data-and-cloud-computing, plays a pivotal role in managing burgeoning blockchain data volumes. The team implemented a dual-layer design, including elliptic technology like Structured Streaming, which processes real-time data streams, aiding in efficient handling of high-volume and high-velocity blockchain transactions. Furthermore, financing the future of the business, particularly real-estate investments, is expected to grow as the new architecture enables scalable and cost-effective querying of blockchain data.