Databricks Tool Aims to Cut Streaming Data Costs
Databricks has launched Zerobus Ingest, a new tool designed to reduce the cost and complexity of streaming data ingestion for its lakehouse platform. The tool automates and consolidates data pipelines, offering a "zero-ops" approach intended to streamline the development of real-time machine learning workflows.
- Zerobus Ingest is designed for single-destination ingestion directly into the Databricks lakehouse, simplifying the architecture by removing the need for an intermediate message bus like Apache Kafka for this specific use case. This contrasts with multi-sink architectures, like Kafka, which are built to route data to numerous endpoints. - The tool provides "at-least-once" delivery guarantees, meaning downstream applications must handle potential duplicate records. This is a trade-off for architectural simplicity and performance, as it avoids the complexities of "exactly-once" semantics found in systems like Kafka, which are critical for use cases like financial transactions. - From a technical standpoint, Zerobus Ingest does not automatically handle schema evolution; the target Delta table's schema must align with the incoming data. However, it does support adding nullable columns to the target table without interrupting ingestion. - For developers, it offers SDKs in languages including Python, Java, and Go, and provides both a high-performance gRPC API and a REST API for broader compatibility. This allows for direct data pushing from applications and IoT devices into Delta tables. - This simplification of data ingestion is a foundational element for real-time AI applications. By reducing latency and complexity, it enables faster data availability for training models and powering AI-driven GTM tools for sales prospecting and pipeline analytics. - Databricks' go-to-market strategy involves deep partnerships with major cloud and hardware players. The company leverages AWS Trainium and has expanded its collaboration with NVIDIA to bring CUDA-accelerated computing to its platform, enhancing performance for AI workloads and aligning with the user's interest in the custom silicon landscape. - For those with leadership ambitions, Databricks' revamped "Brickbuilder" partner program serves as a case study in scaling a deep-tech company. It focuses on enabling partners to develop specialized, industry-specific IP and solutions, recognizing that enterprise AI adoption is driven by a broad ecosystem rather than a single vendor. - The launch of tools like Zerobus reflects a broader venture capital and market trend where "applications inspire infrastructure." The complexity of building real-time AI applications has created a demand for simpler, more efficient data infrastructure, a space that is attracting significant investment.