Apache Cassandra
Apache Cassandra is a distributed wide-column store designed for high availability and linear scalability with no single point of failure. Originally developed at Facebook for inbox search, it uses a partitioned ring architecture with consistent hashing and tunable consistency to handle massive write throughput across data centers. Cassandra excels at workloads requiring always-on availability with the ability to survive entire data center outages without downtime.
Strengths
Weaknesses
Ideal Workloads
- -Time-series data ingestion (metrics, logs, IoT events) using TimeWindowCompactionStrategy
- -Messaging and activity feed systems requiring high write throughput across regions
- -User profile and session stores where availability trumps strict consistency
- -Systems requiring active-active multi-region deployments with local read/write latency
Scaling Model
Linearly scalable by adding nodes to the cluster ring. Data is distributed via consistent hashing on the partition key. Vnodes (virtual nodes) ensure even data distribution. Replication factor determines how many copies exist across the cluster. Multi-DC replication uses NetworkTopologyStrategy to place replicas in different racks and data centers. No master node or coordinator bottleneck.
Consistency Model
Tunable consistency per operation. Write and read consistency levels (ONE, QUORUM, LOCAL_QUORUM, ALL) determine how many replicas must respond. Strong consistency is achievable when R + W > N (read consistency + write consistency > replication factor), but this sacrifices availability during node failures. Default behavior is eventual consistency with last-write-wins conflict resolution using timestamps. Lightweight transactions (LWT) provide linearizable consistency via Paxos but at significant performance cost.
When to Use
- You need to handle hundreds of thousands of writes per second with predictable latency
- Your application requires multi-region active-active deployments with local read/write performance
- Availability is the top priority and you can tolerate eventual consistency
- Your access patterns are well-defined and you can design denormalized tables around them
- You need linear scalability from 3 nodes to hundreds without architectural changes
When Not to Use
- You need ad-hoc queries, joins, or aggregations across different partition keys
- Your application requires strong consistency or multi-row ACID transactions
- Your data model is highly relational with many-to-many relationships
- Your team lacks experience with distributed systems operations (compaction, repair, GC tuning)
- Read-heavy workloads with complex filtering that doesn't align with partition key design
Source: editorial — Based on Apache Cassandra 4.x/5.x documentation and production deployment patterns at scale