SDI.
All Databases
Wide-ColumnOps: high

Apache Cassandra

Apache Cassandra is a distributed wide-column store designed for high availability and linear scalability with no single point of failure. Originally developed at Facebook for inbox search, it uses a partitioned ring architecture with consistent hashing and tunable consistency to handle massive write throughput across data centers. Cassandra excels at workloads requiring always-on availability with the ability to survive entire data center outages without downtime.

Strengths

Linear horizontal scalability: doubling nodes roughly doubles throughput with no resharding downtimeMulti-datacenter and multi-region replication is a first-class feature with configurable replication strategiesNo single point of failure; every node is equal in a peer-to-peer architectureOptimized for write-heavy workloads with LSM-tree storage and append-only commit logTunable consistency per query (ONE, QUORUM, ALL) allows balancing latency vs. correctness

Weaknesses

No support for joins, subqueries, or ad-hoc aggregations; requires denormalized data modelingEventual consistency by default can lead to stale reads and requires careful conflict resolutionRead performance degrades without proper compaction strategy (STCS vs. LCS vs. TWCS)Tombstone accumulation from deletes can cause read latency spikes and heap pressureData modeling is partition-key driven; query patterns must be known upfront and changing them requires new tablesOperational complexity is high: compaction tuning, repair scheduling, and garbage collection require expertise

Ideal Workloads

  • -Time-series data ingestion (metrics, logs, IoT events) using TimeWindowCompactionStrategy
  • -Messaging and activity feed systems requiring high write throughput across regions
  • -User profile and session stores where availability trumps strict consistency
  • -Systems requiring active-active multi-region deployments with local read/write latency

Scaling Model

Linearly scalable by adding nodes to the cluster ring. Data is distributed via consistent hashing on the partition key. Vnodes (virtual nodes) ensure even data distribution. Replication factor determines how many copies exist across the cluster. Multi-DC replication uses NetworkTopologyStrategy to place replicas in different racks and data centers. No master node or coordinator bottleneck.

Consistency Model

Tunable consistency per operation. Write and read consistency levels (ONE, QUORUM, LOCAL_QUORUM, ALL) determine how many replicas must respond. Strong consistency is achievable when R + W > N (read consistency + write consistency > replication factor), but this sacrifices availability during node failures. Default behavior is eventual consistency with last-write-wins conflict resolution using timestamps. Lightweight transactions (LWT) provide linearizable consistency via Paxos but at significant performance cost.

When to Use

  • You need to handle hundreds of thousands of writes per second with predictable latency
  • Your application requires multi-region active-active deployments with local read/write performance
  • Availability is the top priority and you can tolerate eventual consistency
  • Your access patterns are well-defined and you can design denormalized tables around them
  • You need linear scalability from 3 nodes to hundreds without architectural changes

When Not to Use

  • You need ad-hoc queries, joins, or aggregations across different partition keys
  • Your application requires strong consistency or multi-row ACID transactions
  • Your data model is highly relational with many-to-many relationships
  • Your team lacks experience with distributed systems operations (compaction, repair, GC tuning)
  • Read-heavy workloads with complex filtering that doesn't align with partition key design

Source: editorial — Based on Apache Cassandra 4.x/5.x documentation and production deployment patterns at scale

Command Palette

Search for a command to run...