Bird
Raised Fist0
Microservicessystem_design~7 mins

Message brokers (Kafka, RabbitMQ) in Microservices - System Design Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Problem Statement
When multiple microservices need to communicate, direct calls can cause tight coupling and failures cascade if one service is down. Without a reliable message system, messages can be lost, delayed, or cause bottlenecks, leading to inconsistent data and poor user experience.
Solution
Message brokers act as intermediaries that receive, store, and forward messages between services asynchronously. They decouple services by allowing producers to send messages without waiting for consumers, ensuring reliable delivery, buffering, and load leveling.
Architecture
Producer 1
Producer 2
(Kafka or

This diagram shows producers sending messages to a message broker, which then routes messages asynchronously to multiple consumers.

Trade-offs
✓ Pros
Decouples microservices, allowing independent scaling and deployment.
Provides reliable message delivery with persistence and retries.
Enables asynchronous communication, improving system responsiveness.
Supports load balancing and buffering to handle traffic spikes.
✗ Cons
Adds operational complexity and requires managing broker infrastructure.
Introduces eventual consistency, which may complicate data synchronization.
Can increase latency compared to direct synchronous calls.
Use when microservices need reliable, asynchronous communication at scale, especially with high message volumes or when services have different processing speeds.
Avoid when system requires strict synchronous interactions or when message volume is very low (under 100 messages per second), as overhead may outweigh benefits.
Real World Examples
Netflix
Uses Kafka to decouple microservices for event-driven data pipelines, ensuring reliable streaming data delivery across services.
Uber
Employs Kafka to handle high-throughput event streams for real-time analytics and dispatching, preventing service bottlenecks.
Spotify
Uses RabbitMQ for asynchronous task queues to manage background jobs and improve system resilience.
Code Example
The before code shows a direct call from ServiceA to ServiceB, causing tight coupling and blocking. The after code uses RabbitMQ to send messages asynchronously; ServiceA publishes messages to a queue, and ServiceB consumes them independently, improving decoupling and reliability.
Microservices
### Before: Direct synchronous call (no message broker)
class ServiceA:
    def send_data(self, data):
        service_b = ServiceB()
        service_b.process_data(data)

class ServiceB:
    def process_data(self, data):
        print(f"Processing {data}")


### After: Using RabbitMQ message broker
import pika

class ServiceA:
    def send_data(self, data):
        connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
        channel = connection.channel()
        channel.queue_declare(queue='task_queue', durable=True)
        channel.basic_publish(
            exchange='',
            routing_key='task_queue',
            body=data.encode(),
            properties=pika.BasicProperties(delivery_mode=2))
        connection.close()

class ServiceB:
    def start_consuming(self):
        connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
        channel = connection.channel()
        channel.queue_declare(queue='task_queue', durable=True)

        def callback(ch, method, properties, body):
            print(f"Processing {body.decode()}")
            ch.basic_ack(delivery_tag=method.delivery_tag)

        channel.basic_qos(prefetch_count=1)
        channel.basic_consume(queue='task_queue', on_message_callback=callback)
        channel.start_consuming()
OutputSuccess
Alternatives
Direct synchronous API calls
Services call each other directly and wait for responses, creating tight coupling.
Use when: When low latency and immediate response are critical and service dependencies are stable.
Event streaming platforms (e.g., Apache Pulsar)
Similar to Kafka but with multi-tenancy and geo-replication features.
Use when: When advanced multi-region replication and topic management are required.
Shared database polling
Services read/write to a shared database to communicate, polling for changes.
Use when: For very simple or legacy systems where introducing a broker is not feasible.
Summary
Message brokers decouple microservices by enabling asynchronous communication through reliable message passing.
They improve system resilience and scalability by buffering and retrying messages between producers and consumers.
Choosing the right broker depends on message volume, delivery guarantees, and system requirements.

Practice

(1/5)
1. What is the primary role of a message broker like Kafka or RabbitMQ in a microservices architecture?
easy
A. To store large amounts of user data permanently
B. To enable services to communicate asynchronously by passing messages
C. To replace the database in microservices
D. To directly execute business logic in services

Solution

  1. Step 1: Understand message broker function

    Message brokers act as middlemen that help services send and receive messages without waiting for each other.
  2. Step 2: Identify correct role in microservices

    They enable asynchronous communication, improving scalability and fault tolerance.
  3. Final Answer:

    To enable services to communicate asynchronously by passing messages -> Option B
  4. Quick Check:

    Message broker = asynchronous communication [OK]
Hint: Message brokers pass messages between services asynchronously [OK]
Common Mistakes:
  • Confusing brokers with databases
  • Thinking brokers execute business logic
  • Assuming brokers store permanent user data
2. Which of the following is the correct way to declare a RabbitMQ queue in code?
easy
A. channel.queueDeclare('task_queue', true, false, false, null);
B. channel.createQueue('task_queue', durable=true);
C. queue.declare('task_queue', persistent=True);
D. rabbitmq.queue('task_queue', durable=True);

Solution

  1. Step 1: Recall RabbitMQ queue declaration syntax

    In RabbitMQ Java client, channel.queueDeclare is used with parameters: queue name, durable, exclusive, autoDelete, and arguments.
  2. Step 2: Match correct syntax

    channel.queueDeclare('task_queue', true, false, false, null); matches the official method signature and parameter order correctly.
  3. Final Answer:

    channel.queueDeclare('task_queue', true, false, false, null); -> Option A
  4. Quick Check:

    RabbitMQ queueDeclare syntax = channel.queueDeclare('task_queue', true, false, false, null); [OK]
Hint: Remember RabbitMQ uses channel.queueDeclare with 5 parameters [OK]
Common Mistakes:
  • Using incorrect method names like createQueue
  • Passing parameters with wrong names or order
  • Confusing RabbitMQ syntax with other brokers
3. Given the following Kafka consumer code snippet, what will be the output if the topic has 3 messages and auto-commit is enabled?
consumer.subscribe(['orders'])
for message in consumer.poll(timeout_ms=1000).values():
    print(message.value.decode('utf-8'))
medium
A. Prints nothing because poll returns a dict of lists
B. Prints only the first message and stops
C. Prints all 3 messages from the 'orders' topic
D. Raises an error due to wrong method usage

Solution

  1. Step 1: Analyze Kafka consumer.poll() return type

    The poll() method returns a dictionary where keys are partitions and values are lists of messages.
  2. Step 2: Understand iteration over poll().values()

    Iterating over values() gives lists of messages, not individual messages, so calling message.value will cause an error because message is a list, not a message object.
  3. Final Answer:

    Raises an error due to wrong method usage -> Option D
  4. Quick Check:

    poll() returns dict of lists; iterating directly over values and accessing message.value causes error [OK]
Hint: poll() returns dict of lists, not single messages [OK]
Common Mistakes:
  • Assuming poll() returns a flat list of messages
  • Not decoding message values properly
  • Ignoring that poll() returns per-partition batches
4. A developer wrote this RabbitMQ consumer code but it never receives messages:
channel.basicConsume('task_queue', autoAck=False, callback=process_message)

What is the likely issue?
medium
A. The consumer must call channel.start_consuming() to begin receiving messages
B. The callback function name should be 'on_message' instead of 'process_message'
C. autoAck must be set to True for messages to be received
D. The queue name 'task_queue' is invalid and must be changed

Solution

  1. Step 1: Understand RabbitMQ consumer lifecycle

    After setting up basicConsume, the consumer must start the event loop with channel.start_consuming() to receive messages.
  2. Step 2: Identify missing call

    The code lacks start_consuming(), so no messages are delivered.
  3. Final Answer:

    The consumer must call channel.start_consuming() to begin receiving messages -> Option A
  4. Quick Check:

    Missing start_consuming() = The consumer must call channel.start_consuming() to begin receiving messages [OK]
Hint: Remember to call start_consuming() after basicConsume [OK]
Common Mistakes:
  • Thinking callback function name must be fixed
  • Believing autoAck controls message receipt
  • Assuming queue name is invalid without evidence
5. You need to design a scalable order processing system using Kafka. Which approach best ensures message order per customer while allowing parallel processing across customers?
hard
A. Use a single Kafka partition for all orders to keep global order
B. Use multiple topics, one per customer, to isolate order streams
C. Partition messages by customer ID so each customer's orders stay ordered in their partition
D. Send all orders to a single consumer instance to maintain order

Solution

  1. Step 1: Understand Kafka partitioning and ordering

    Kafka guarantees order only within a partition, so to keep order per customer, messages must be partitioned by customer ID.
  2. Step 2: Evaluate options for scalability and ordering

    Partitioning by customer ID allows parallel processing across partitions (customers) while preserving order per customer.
  3. Final Answer:

    Partition messages by customer ID so each customer's orders stay ordered in their partition -> Option C
  4. Quick Check:

    Partition by key for order + parallelism = Partition messages by customer ID so each customer's orders stay ordered in their partition [OK]
Hint: Partition by customer ID to keep order and scale processing [OK]
Common Mistakes:
  • Using single partition limits scalability
  • Creating many topics adds unnecessary complexity
  • Using single consumer blocks parallelism