0
0
HLDsystem_design~25 mins

Read replicas in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Read Replica System for Database Scaling
Design focuses on read replica architecture, replication mechanism, failover strategy, and load balancing for read queries. Write path design and application logic are out of scope.
Functional Requirements
FR1: Support high read throughput by distributing read requests across multiple replicas
FR2: Ensure data consistency between primary database and replicas with minimal lag
FR3: Allow failover to replicas in case the primary database fails
FR4: Support read queries with p99 latency under 100ms
FR5: Handle up to 100,000 read requests per second
FR6: Maintain 99.9% availability for read operations
Non-Functional Requirements
NFR1: Replication lag should be less than 2 seconds
NFR2: Writes must go to the primary database only
NFR3: System should be horizontally scalable by adding more read replicas
NFR4: Failover should not cause more than 5 seconds downtime
NFR5: Use industry-standard replication technologies
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Primary database server
Read replica database servers
Replication mechanism (e.g., asynchronous or semi-synchronous replication)
Load balancer for read traffic
Monitoring and alerting system for replication lag and failover
Failover manager or orchestrator
Design Patterns
Master-slave replication pattern
Read-write splitting
Failover and recovery pattern
Load balancing with health checks
Eventual consistency vs strong consistency trade-offs
Reference Architecture
          +-------------------+
          |   Application     |
          +---------+---------+
                    |
          +---------v---------+
          |   Read Load       |
          |   Balancer        |
          +----+--------+-----+
               |        |
      +--------v+      +-v---------+
      | Read Replica 1| Read Replica 2|
      +--------------+--------------+
               \        /
                \      /
             +----v----v----+
             | Primary DB   |
             +-------------+
Components
Primary Database
Relational DBMS with replication support (e.g., PostgreSQL, MySQL)
Handles all write operations and propagates changes to read replicas
Read Replicas
Same DBMS as primary configured as replicas
Serve read-only queries to reduce load on primary and improve read throughput
Replication Mechanism
Asynchronous or semi-synchronous replication
Keep replicas updated with changes from primary with minimal lag
Read Load Balancer
Software or hardware load balancer (e.g., HAProxy, Envoy)
Distribute read requests evenly across healthy replicas
Monitoring System
Monitoring tools (e.g., Prometheus, Grafana)
Track replication lag, replica health, and trigger alerts
Failover Manager
Automated failover tool or orchestrator (e.g., Patroni, MHA)
Detect primary failure and promote a replica to primary
Request Flow
1. 1. Application sends write requests to the primary database.
2. 2. Primary database processes writes and asynchronously replicates changes to read replicas.
3. 3. Application sends read requests to the read load balancer.
4. 4. Load balancer routes read requests to one of the healthy read replicas.
5. 5. Read replica serves the read query and returns the result to the application.
6. 6. Monitoring system continuously checks replication lag and replica health.
7. 7. If primary database fails, failover manager promotes a read replica to primary.
8. 8. Application updates configuration to send writes to the new primary.
Database Schema
Entities remain the same as primary database schema. Replication is at the data storage level, so no schema changes are required. Relationships and constraints are preserved on replicas to ensure data integrity.
Scaling Discussion
Bottlenecks
Replication lag increases with high write volume causing stale reads
Primary database becomes a write bottleneck under heavy write load
Load balancer becomes a single point of failure or bottleneck
Failover delay causes downtime impacting availability
Network bandwidth limits replication speed
Solutions
Use semi-synchronous replication to reduce lag at cost of slight write latency increase
Scale writes by sharding or partitioning data across multiple primaries
Deploy multiple load balancers with health checks and failover
Automate failover with fast detection and promotion tools
Use compression and efficient protocols for replication traffic
Interview Tips
Time: Spend 10 minutes clarifying requirements and constraints, 15 minutes designing architecture and data flow, 10 minutes discussing scaling and failover strategies, and 10 minutes answering questions.
Explain the difference between primary and read replicas and their roles
Discuss replication methods and trade-offs between consistency and latency
Describe how load balancing improves read throughput and availability
Highlight monitoring and failover mechanisms to maintain uptime
Address scaling challenges and realistic solutions