HLDsystem_design~25 mins

Message ordering guarantees in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Message Ordering Guarantee System

Design focuses on the messaging system's ordering guarantees and core architecture. It excludes detailed security, user interface, and persistence backup strategies.

Functional Requirements

FR1: Ensure messages sent between distributed components are received in the correct order.

FR2: Support multiple ordering guarantees: no ordering, per-producer ordering, global ordering.

FR3: Handle message loss and retries without breaking ordering guarantees.

FR4: Allow scalable message throughput with low latency.

FR5: Provide APIs for producers to send messages and consumers to receive messages in order.

Non-Functional Requirements

NFR1: System must handle up to 100,000 messages per second.

NFR2: End-to-end message delivery latency should be under 200ms for 99th percentile.

NFR3: Availability target of 99.9% uptime.

NFR4: Support horizontal scaling of producers and consumers.

NFR5: Ordering guarantees must be maintained even under failures and retries.

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

Message producers

Message brokers or queues

Partitions or shards for scaling

Message consumers

Ordering metadata (sequence numbers, timestamps)

Retry and dead-letter handling

Design Patterns

Partitioned queues with per-partition ordering

Sequence numbers for ordering enforcement

Idempotent message processing

Leader election for global ordering

Exactly-once or at-least-once delivery semantics

Reference Architecture

 +-------------+       +----------------+       +--------------+       +--------------+
 |  Producers  | ----> | Message Broker | ----> | Partitions   | ----> | Consumers    |
 +-------------+       +----------------+       +--------------+       +--------------+
                           |                        |                      |
                           |                        |                      |
                           +----> Ordering Metadata (sequence numbers) <----+

Notes:
- Producers send messages with sequence numbers per partition.
- Broker routes messages to partitions to maintain order.
- Consumers read from partitions in order.
- Retry and duplicate handling ensure ordering is preserved.

Components

Producers

Any client application or service

Send messages with ordering metadata (e.g., sequence numbers) to the broker.

Message Broker

Kafka, RabbitMQ, or custom queue system

Receive messages, assign them to partitions, and maintain order per partition.

Partitions

Logical or physical queues

Divide message stream to enable parallelism while preserving order within each partition.

Consumers

Client applications or services

Consume messages from partitions in order, process them reliably.

Ordering Metadata

Sequence numbers or timestamps

Track message order per producer or globally to enforce ordering guarantees.

Request Flow

1. Producer assigns a sequence number to each message per partition and sends it to the broker.

2. Broker receives messages and routes them to the correct partition based on key or producer ID.

3. Within each partition, messages are stored in the order received.

4. Consumers subscribe to partitions and receive messages in the exact order they were stored.

5. If a message fails processing, consumer retries or dead-letter handling ensures order is not broken.

6. Ordering metadata is used to detect missing or out-of-order messages and handle them appropriately.

Database Schema

Entities: - Message: {message_id (PK), partition_id, producer_id, sequence_number, payload, timestamp, status} - Partition: {partition_id (PK), broker_id} - Producer: {producer_id (PK), metadata} Relationships: - Each Message belongs to one Partition. - Each Message is produced by one Producer. - Sequence_number is unique per (producer_id, partition_id) pair to maintain order.

Scaling Discussion

Bottlenecks

Single partition can become a throughput bottleneck limiting parallelism.

Global ordering requires coordination, which can increase latency and reduce availability.

Message broker storage and network bandwidth can be overwhelmed at high message rates.

Consumer processing speed can limit end-to-end latency.

Solutions

Increase number of partitions to allow parallel processing and higher throughput.

Use partition keys to shard messages so ordering is guaranteed per partition, not globally.

Implement leader election and consensus protocols only when global ordering is strictly required.

Use efficient storage and replication strategies in the broker to handle load.

Scale consumers horizontally and implement backpressure to handle bursts.

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing architecture and data flow, 10 minutes discussing scaling and trade-offs, and 5 minutes summarizing.

Clarify types of ordering guarantees and their impact on design.

Explain partitioning strategy to balance ordering and scalability.

Describe how sequence numbers or metadata enforce order.

Discuss failure handling to maintain ordering guarantees.

Highlight trade-offs between global ordering and system performance.