ClickHouse
ClickHouse is an open-source columnar OLAP database management system designed for real-time analytical queries on billions of rows with sub-second response times. Developed at Yandex for web analytics (Yandex.Metrica processes over 20 billion events per day), it uses columnar storage with aggressive compression, vectorized query execution, and sparse indexing to achieve orders-of-magnitude better performance than row-oriented databases for analytical workloads. ClickHouse supports SQL with extensions for approximate query processing, materialized views, and array/map operations.
Strengths
Weaknesses
Ideal Workloads
- -Real-time analytics dashboards over billions of events (web analytics, product analytics, ad tech)
- -Log and event storage with time-based partitioning and TTL-based automatic cleanup
- -Financial and business intelligence reporting with complex aggregations and window functions
- -Metrics and observability backends as an alternative to specialized time-series databases
Scaling Model
Scales vertically by leveraging all CPU cores via parallel query execution. Horizontal scaling via sharding with a Distributed table engine that routes queries across shards. Each shard is typically a ReplicatedMergeTree with 2-3 replicas using ClickHouse Keeper for coordination. Data is inserted into local tables and queried via distributed tables. SharedMergeTree (ClickHouse Cloud) decouples compute from storage using object storage.
Consistency Model
Strong consistency within a single node for reads (read-your-own-writes). Replicated tables use a log-based replication model where all replicas converge to the same state by replaying the same sequence of operations. Mutations (UPDATEs/DELETEs) are applied asynchronously and eventually consistent. INSERT operations are atomic at the block level (default ~1M rows or 1 partition). No multi-statement transactions.
When to Use
- You need sub-second analytical queries over billions of rows without pre-aggregation
- Your workload is append-heavy with rare updates or deletes (event logs, metrics, clickstreams)
- You want high compression ratios to minimize storage costs for large analytical datasets
- You need real-time materialized views for dashboard acceleration
- You are building an observability or product analytics platform
When Not to Use
- You need OLTP capabilities with frequent point updates and deletes
- Your workload requires multi-row ACID transactions
- You need low-latency key-value lookups by primary key
- Your queries involve complex many-to-many joins across large tables
- You want a fully managed, zero-ops experience (though ClickHouse Cloud is improving this)
Source: editorial — Based on ClickHouse documentation, Yandex.Metrica architecture, and OLAP benchmarking data