HLDsystem_design~25 mins

Exactly-once processing challenges in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Exactly-once Processing System

Design focuses on the core processing pipeline ensuring exactly-once semantics. Out of scope are UI components, detailed security policies, and specific business logic of message content.

Functional Requirements

FR1: Process each message or event exactly once without duplication or loss

FR2: Support high throughput of at least 10,000 messages per second

FR3: Ensure system recovers gracefully from failures without reprocessing messages

FR4: Provide end-to-end data consistency between producers and consumers

FR5: Allow horizontal scaling to handle increased load

Non-Functional Requirements

NFR1: Maximum processing latency of 200ms per message

NFR2: Availability target of 99.9% uptime

NFR3: Support distributed environment with multiple producers and consumers

NFR4: Use industry-standard messaging and storage technologies

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

Message queue or event streaming platform (e.g., Kafka, RabbitMQ)

Idempotent processing logic or deduplication store

State store or database with transactional support

Checkpointing or offset tracking mechanism

Distributed coordination service (e.g., ZooKeeper, etcd)

Design Patterns

Idempotent consumer pattern

Transactional outbox pattern

Two-phase commit or distributed transactions

Exactly-once delivery with offset commit

Event sourcing with deduplication

Reference Architecture

  +-------------+       +----------------+       +----------------+
  |  Producers  | ----> |  Message Queue | ----> |  Consumers     |
  +-------------+       +----------------+       +----------------+
                                |                        |
                                |                        v
                                |               +----------------+
                                |               | State Store /  |
                                |               | Deduplication  |
                                |               +----------------+
                                |                        |
                                |                        v
                                |               +----------------+
                                |               | Checkpointing  |
                                |               | & Offset Store |
                                |               +----------------+

Components

Producers

Any client application or service

Send messages/events to the message queue reliably

Message Queue

Apache Kafka or RabbitMQ

Durably store messages and support ordered delivery with offset tracking

Consumers

Stateless or stateful processing service

Process messages exactly once using idempotent logic and transactional updates

State Store / Deduplication Store

Distributed database with transactional support (e.g., PostgreSQL, Cassandra)

Track processed message IDs or offsets to avoid duplicates

Checkpointing & Offset Store

Kafka offset commit or external store

Record progress of processed messages to enable recovery without reprocessing

Request Flow

1. Producer sends message to the message queue with unique message ID.

2. Message queue stores message durably and assigns an offset.

3. Consumer reads message from queue and checks deduplication store for message ID.

4. If message ID not processed, consumer processes message and updates state store transactionally.

5. Consumer commits offset/checkpoint only after successful processing and state update.

6. If failure occurs, consumer restarts from last committed offset to avoid duplicates.

7. Deduplication store ensures repeated messages are ignored, achieving exactly-once processing.

Database Schema

Entities: - MessagesProcessed(message_id PRIMARY KEY, processed_timestamp) - ConsumerOffsets(consumer_id PRIMARY KEY, last_committed_offset) Relationships: - MessagesProcessed tracks unique message IDs to prevent duplicates. - ConsumerOffsets track progress per consumer group for recovery.

Scaling Discussion

Bottlenecks

State store becoming a write bottleneck due to high volume of deduplication checks

Message queue throughput limits under very high load

Checkpointing latency delaying offset commits

Network latency between distributed components affecting consistency

Solutions

Partition state store by consumer or message key to distribute load

Use scalable message queues like Kafka with partitioning and replication

Batch checkpoint commits to reduce overhead while balancing latency

Deploy components in the same data center or use faster network links

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 20 minutes designing the architecture and data flow, 10 minutes discussing scaling and failure handling, and 5 minutes summarizing.

Explain the difference between at-least-once, at-most-once, and exactly-once processing

Discuss failure scenarios and how the design handles them

Describe how idempotency and transactional updates prevent duplicates

Highlight the role of checkpointing and offset management

Mention trade-offs between complexity and guarantees