Message Queues
Message queues decouple producers and consumers by buffering messages in a durable, ordered store, enabling asynchronous processing, load leveling, and fault-tolerant communication between services.
A message queue sits between a producer (sender) and a consumer (processor), storing messages until the consumer is ready to process them. This decouples services in time and space — the producer doesn't need to know who consumes, and the consumer doesn't need to be available when the message is sent. Key patterns are point-to-point (one consumer per message) and pub/sub (multiple subscribers). Major systems include Kafka (distributed log), RabbitMQ (AMQP broker), and SQS (managed AWS queue).
Tradeoffs
Strengths
- Decoupling: Producers and consumers evolve independently; services can be deployed, scaled, and updated separately.
- Resilience: Messages survive consumer crashes and are reprocessed on recovery.
- Load leveling: Absorbs traffic spikes, protecting downstream services from overload.
- Scalability: Kafka scales to millions of messages/sec with partitioning; SQS scales virtually without limits.
- Replay: Kafka's log retention enables reprocessing historical events and bootstrapping new consumers.
Weaknesses
- Added latency: Async processing means responses aren't immediate — not suitable for user-facing request paths that need instant results.
- Complexity: Debugging distributed async flows is harder than tracing a synchronous call chain.
- Ordering challenges: Maintaining global order requires sacrificing parallelism.
- Exactly-once is hard: Most systems provide at-least-once, requiring idempotent consumers.
- Operational overhead: Kafka clusters require careful tuning, monitoring, and capacity planning (except managed services).
- Data consistency: Eventual consistency between services can create user-visible anomalies if not handled carefully.
Likely Follow-Up Questions
- How would you ensure exactly-once processing in a system that writes to an external database?
- What is the outbox pattern and when would you use it?
- How do you handle message ordering when scaling consumers horizontally?
- When would you choose Kafka over RabbitMQ?
- How would you design a dead-letter queue strategy?
- What is consumer lag and how do you monitor and respond to it?
Related Concepts
Source: editorial — Synthesized from Apache Kafka documentation, RabbitMQ guides, AWS SQS documentation, and Uber/LinkedIn engineering blogs.