SDI.
All Architectures

WhatsApp

Real-time messaging system handling 100B+ messages per day with end-to-end encryption, presence tracking, and media delivery across billions of devices.

100B+

daily Messages

~1B

concurrent Connections

~7B photos, 1B videos

media Per Day

~10,000 Erlang servers (historically)

servers

<200ms message delivery globally

latency

Architecture Diagram

Mobile ClientclientLoad BalancergatewayChat ServerservicePresence ServiceserviceMessage QueuequeueMessage StoredatabaseUser DBdatabaseMedia StoragestorageCDNcdnPush Notificationservice

Data Flow

1

Mobile ClientLoad BalancerConnect

Client establishes persistent WebSocket connection through the load balancer.

2

Load BalancerChat ServerRoute

Load balancer assigns connection to a chat server, maintaining session affinity.

3

Mobile ClientChat ServerSend Message

Client sends encrypted message over WebSocket. Chat server validates and processes.

4

Chat ServerPresence ServiceCheck Status

Chat server checks if recipient is online and which chat server holds their connection.

5

Chat ServerChat ServerForward

If recipient is online, message is forwarded directly to their chat server for immediate delivery.

6

Chat ServerMessage QueueQueue

If recipient is offline, message is queued for later delivery.

7

Chat ServerPush NotificationNotify

Push notification sent via APNs/FCM to wake the recipient's device.

8

Mobile ClientMedia StorageUpload Media

Client uploads encrypted media directly to storage, receives a media ID.

9

Mobile ClientCDNDownload Media

Recipient downloads encrypted media from nearest CDN edge.

Key Architectural Decisions

  • End-to-end encryption using Signal Protocol means servers never see plaintext — limits server-side features but maximizes privacy
  • Erlang/BEAM VM chosen for extreme concurrency — single server handles millions of connections
  • Messages stored only until delivered, then deleted from servers — reduces storage but complicates multi-device sync
  • Fan-out on write for small groups, fan-out on read for broadcast lists
  • WebSocket for persistent connections with fallback to long polling

Tradeoffs

Strengths

  • Extreme efficiency: ~50 engineers served 900M users at acquisition
  • End-to-end encryption provides strong privacy guarantees
  • Erlang's actor model naturally maps to per-connection processes
  • Minimal server-side storage reduces data liability

Weaknesses

  • Multi-device support is limited by the encryption model
  • No server-side search or message history (by design)
  • Group size limits due to fan-out costs
  • Media transcoding happens client-side, increasing battery usage

Interview Drilldown Questions

  • How would you handle message ordering in a distributed chat system?
  • What happens when a user comes online and has 10,000 pending messages?
  • How does end-to-end encryption work with group chats?
  • How would you design the read receipt system?
  • What's the strategy for handling media in regions with poor connectivity?

Components

client

Mobile Client

iOS/Android app maintaining persistent connection

Learn more →
gateway

Load Balancer

Distributes WebSocket connections across chat servers

Learn more →
service

Chat Server

Manages WebSocket connections, message routing, and presence

Learn more →
service

Presence Service

Tracks online/offline/typing status with heartbeats

queue

Message Queue

Buffers messages for offline users and ensures delivery ordering

Learn more →
database

Message Store

Stores undelivered messages temporarily (Mnesia/custom)

Learn more →
database

User DB

User profiles, contacts, and group membership

Learn more →
storage

Media Storage

Encrypted photos/videos/voice with CDN distribution

Learn more →
cdn

CDN

Serves encrypted media files with regional edge caching

Learn more →
service

Push Notification

APNs/FCM for waking dormant clients

Source: editorial — Synthesized from public engineering talks, Signal Protocol documentation, and system design references

Command Palette

Search for a command to run...