Twitter / X

Social media platform handling 500M+ tweets per day with real-time timeline delivery, trend detection, and search across billions of tweets.

500M+

daily Tweets

~300K requests/sec

timeline Reads

~5M timeline writes/sec during peak

fanout Writes

Billions of tweets indexed

search Index

~150TB Redis for timelines

cache Size

Architecture Diagram

Client → API GatewayPost Tweet

User submits a tweet. Gateway validates auth and routes to Tweet Service.

API Gateway → Tweet ServiceCreate

Tweet Service validates content, stores tweet, and publishes to Kafka.

Tweet Service → Tweet StorePersist

Tweet stored durably in Manhattan (distributed key-value store).

Tweet Service → KafkaPublish Event

Tweet creation event published to Kafka for async processing.

Kafka → Fanout ServiceTrigger Fanout

Fanout Service consumes events and distributes to follower timelines.

Fanout Service → Social GraphGet Followers

Queries FlockDB for the tweeter's follower list.

Fanout Service → Timeline CacheWrite Timelines

Prepends tweet ID to each follower's cached timeline in Redis.

Client → Timeline ServiceLoad Timeline

User opens app. Timeline Service reads from cache + merges real-time tweets.

Timeline Service → Timeline CacheRead Cache

Pre-computed timeline read from Redis. For celebrities, fan-out on read is used instead.

Kafka → Search ServiceIndex

Tweets indexed in near-real-time for search (Earlybird inverted index).

Kafka → Trends ServiceDetect Trends

Streaming algorithms detect emerging topics from tweet velocity.

Hybrid fan-out: fan-out on write for regular users (<10K followers), fan-out on read for celebrities — balances write amplification vs read latency
Redis for timeline cache — O(1) prepend operations and sorted sets make it ideal for timeline assembly
Manhattan (custom distributed DB) over Cassandra for tweet storage — optimized for Twitter's specific access patterns
Earlybird (custom Lucene-based search) for real-time tweet indexing within seconds of posting
Separate read and write paths to independently scale timeline reads vs tweet ingestion

Hybrid fanout elegantly handles the celebrity follower problem (Lady Gaga has 80M+ followers)
Pre-computed timelines in Redis provide sub-100ms timeline loads
Real-time search indexing means tweets are searchable within seconds
Kafka decouples tweet ingestion from all downstream processing

Fan-out on write creates massive write amplification — one celebrity tweet generates millions of cache writes
Timeline cache requires ~150TB of Redis, a significant operational cost
Hybrid approach adds complexity — must determine user's fanout strategy dynamically
Delete/edit propagation across all cached timelines is complex and eventually consistent

How do you decide the follower threshold between fan-out on write and fan-out on read?
How would you handle tweet deletion across millions of pre-computed timelines?
How does the trending algorithm distinguish genuine trends from spam/bots?
How would you design the notification system for @mentions and replies?
What's the strategy for timeline ranking vs chronological ordering?