Apache Doris for Real-Time Analytics
For those working with large-scale data, Apache Doris is being highlighted as an MPP-based, real-time data warehousing solution. It supports sub-second queries on large datasets, making it applicable for fintech and quant projects that require high-throughput data workloads.
Originally developed within Baidu as "Palo," Apache Doris was created to simplify a complex hybrid architecture that previously relied on sharded MySQL and other tools to handle both low-latency serving queries and high-throughput ad-hoc analysis. The system was designed from the ground up as a single, tightly-coupled Massively Parallel Processing (MPP) database to avoid the complexity of managing multiple disparate systems. Its architecture consists of two main components: Frontend (FE) nodes that manage metadata and plan queries, and Backend (BE) nodes that handle data storage and query execution. This separation allows for horizontal scaling. Doris employs a columnar storage engine and a vectorized query execution engine, which significantly boosts performance in OLAP workloads by optimizing CPU and I/O resource usage. A key design choice is its high compatibility with the MySQL protocol, allowing users to connect to Doris using standard MySQL clients and BI tools like Tableau or Superset without custom connectors. This feature lowers the barrier to adoption for development teams already familiar with the MySQL ecosystem. In performance benchmarks, Apache Doris has demonstrated significant advantages over alternatives like ClickHouse, particularly in scenarios involving real-time data updates and complex multi-table joins. One benchmark showed Doris to be 18-34 times faster than ClickHouse in the Star Schema Benchmark (SSB) with update-intensive workloads. Companies like Tencent Music and Kwai replaced ClickHouse with Doris to improve multi-table join performance by up to 10x and simplify their data architecture. Major technology firms have adopted Doris for high-stakes applications. The short video platform Kwai handles nearly one billion daily queries with Doris. A fintech service provider replaced a ClickHouse and MySQL combination, speeding up average query response times by 50% and achieving over 99% SLA compliance for data synchronization. In financial anti-fraud systems, a retail bank uses Doris to run multi-dimensional analysis with over 10 filtering conditions, receiving alerts on high-risk transactions in under five seconds. Doris supports federated queries across external data sources like Hive, Iceberg, and Hudi, allowing it to act as a unified query gateway for data lakes. It offers distinct data models for different use cases: an Aggregate Model for pre-aggregated reporting, a Unique Key model for real-time upserts, and a Duplicate Key model for detailed ad-hoc queries. The core developers of Apache Doris founded SelectDB to provide a cloud-native, fully-managed version of the database. This commercial entity offers an enterprise edition with additional features and long-term support, building on the open-source core.