Twitter / X
Social media platform handling 500M+ tweets per day with real-time timeline delivery, trend detection, and search across billions of tweets.
500M+
daily Tweets
~300K requests/sec
timeline Reads
~5M timeline writes/sec during peak
fanout Writes
Billions of tweets indexed
search Index
~150TB Redis for timelines
cache Size
Architecture Diagram
Data Flow
Client → API GatewayPost Tweet
User submits a tweet. Gateway validates auth and routes to Tweet Service.
API Gateway → Tweet ServiceCreate
Tweet Service validates content, stores tweet, and publishes to Kafka.
Tweet Service → Tweet StorePersist
Tweet stored durably in Manhattan (distributed key-value store).
Tweet Service → KafkaPublish Event
Tweet creation event published to Kafka for async processing.
Kafka → Fanout ServiceTrigger Fanout
Fanout Service consumes events and distributes to follower timelines.
Fanout Service → Social GraphGet Followers
Queries FlockDB for the tweeter's follower list.
Fanout Service → Timeline CacheWrite Timelines
Prepends tweet ID to each follower's cached timeline in Redis.
Client → Timeline ServiceLoad Timeline
User opens app. Timeline Service reads from cache + merges real-time tweets.
Timeline Service → Timeline CacheRead Cache
Pre-computed timeline read from Redis. For celebrities, fan-out on read is used instead.
Kafka → Search ServiceIndex
Tweets indexed in near-real-time for search (Earlybird inverted index).
Kafka → Trends ServiceDetect Trends
Streaming algorithms detect emerging topics from tweet velocity.
Key Architectural Decisions
- Hybrid fan-out: fan-out on write for regular users (<10K followers), fan-out on read for celebrities — balances write amplification vs read latency
- Redis for timeline cache — O(1) prepend operations and sorted sets make it ideal for timeline assembly
- Manhattan (custom distributed DB) over Cassandra for tweet storage — optimized for Twitter's specific access patterns
- Earlybird (custom Lucene-based search) for real-time tweet indexing within seconds of posting
- Separate read and write paths to independently scale timeline reads vs tweet ingestion
Tradeoffs
Strengths
- Hybrid fanout elegantly handles the celebrity follower problem (Lady Gaga has 80M+ followers)
- Pre-computed timelines in Redis provide sub-100ms timeline loads
- Real-time search indexing means tweets are searchable within seconds
- Kafka decouples tweet ingestion from all downstream processing
Weaknesses
- Fan-out on write creates massive write amplification — one celebrity tweet generates millions of cache writes
- Timeline cache requires ~150TB of Redis, a significant operational cost
- Hybrid approach adds complexity — must determine user's fanout strategy dynamically
- Delete/edit propagation across all cached timelines is complex and eventually consistent
Interview Drilldown Questions
- How do you decide the follower threshold between fan-out on write and fan-out on read?
- How would you handle tweet deletion across millions of pre-computed timelines?
- How does the trending algorithm distinguish genuine trends from spam/bots?
- How would you design the notification system for @mentions and replies?
- What's the strategy for timeline ranking vs chronological ordering?
Components
Fanout Service
Distributes tweets to follower timelines — fan-out on write for most users
Learn more →Trends Service
Detects trending topics using streaming algorithms
Related Concepts
Source: editorial — Synthesized from Twitter engineering blog, public architecture talks, and system design references