HLDsystem_design~25 mins

Read replicas in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Read Replica System for Database Scaling

Design focuses on read replica architecture, replication mechanism, failover strategy, and load balancing for read queries. Write path design and application logic are out of scope.

Functional Requirements

FR1: Support high read throughput by distributing read requests across multiple replicas

FR2: Ensure data consistency between primary database and replicas with minimal lag

FR3: Allow failover to replicas in case the primary database fails

FR4: Support read queries with p99 latency under 100ms

FR5: Handle up to 100,000 read requests per second

FR6: Maintain 99.9% availability for read operations

Non-Functional Requirements

NFR1: Replication lag should be less than 2 seconds

NFR2: Writes must go to the primary database only

NFR3: System should be horizontally scalable by adding more read replicas

NFR4: Failover should not cause more than 5 seconds downtime

NFR5: Use industry-standard replication technologies

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

Key Components

Primary database server

Read replica database servers

Replication mechanism (e.g., asynchronous or semi-synchronous replication)

Load balancer for read traffic

Monitoring and alerting system for replication lag and failover

Failover manager or orchestrator

Design Patterns

Master-slave replication pattern

Read-write splitting

Failover and recovery pattern

Load balancing with health checks

Eventual consistency vs strong consistency trade-offs

Reference Architecture

          +-------------------+
          |   Application     |
          +---------+---------+
                    |
          +---------v---------+
          |   Read Load       |
          |   Balancer        |
          +----+--------+-----+
               |        |
      +--------v+      +-v---------+
      | Read Replica 1| Read Replica 2|
      +--------------+--------------+
               \        /
                \      /
             +----v----v----+
             | Primary DB   |
             +-------------+

Components

Primary Database

Relational DBMS with replication support (e.g., PostgreSQL, MySQL)

Handles all write operations and propagates changes to read replicas

Read Replicas

Same DBMS as primary configured as replicas

Serve read-only queries to reduce load on primary and improve read throughput

Replication Mechanism

Asynchronous or semi-synchronous replication

Keep replicas updated with changes from primary with minimal lag

Read Load Balancer

Software or hardware load balancer (e.g., HAProxy, Envoy)

Distribute read requests evenly across healthy replicas

Monitoring System

Monitoring tools (e.g., Prometheus, Grafana)

Track replication lag, replica health, and trigger alerts

Failover Manager

Automated failover tool or orchestrator (e.g., Patroni, MHA)

Detect primary failure and promote a replica to primary

Request Flow

1. 1. Application sends write requests to the primary database.

2. 2. Primary database processes writes and asynchronously replicates changes to read replicas.

3. 3. Application sends read requests to the read load balancer.

4. 4. Load balancer routes read requests to one of the healthy read replicas.

5. 5. Read replica serves the read query and returns the result to the application.

6. 6. Monitoring system continuously checks replication lag and replica health.

7. 7. If primary database fails, failover manager promotes a read replica to primary.

8. 8. Application updates configuration to send writes to the new primary.

Database Schema

Entities remain the same as primary database schema. Replication is at the data storage level, so no schema changes are required. Relationships and constraints are preserved on replicas to ensure data integrity.

Scaling Discussion

Bottlenecks

Replication lag increases with high write volume causing stale reads

Primary database becomes a write bottleneck under heavy write load

Load balancer becomes a single point of failure or bottleneck

Failover delay causes downtime impacting availability

Network bandwidth limits replication speed

Solutions

Use semi-synchronous replication to reduce lag at cost of slight write latency increase

Scale writes by sharding or partitioning data across multiple primaries

Deploy multiple load balancers with health checks and failover

Automate failover with fast detection and promotion tools

Use compression and efficient protocols for replication traffic

Interview Tips

Time: Spend 10 minutes clarifying requirements and constraints, 15 minutes designing architecture and data flow, 10 minutes discussing scaling and failover strategies, and 10 minutes answering questions.

Explain the difference between primary and read replicas and their roles

Discuss replication methods and trade-offs between consistency and latency

Describe how load balancing improves read throughput and availability

Highlight monitoring and failover mechanisms to maintain uptime

Address scaling challenges and realistic solutions