0
0
HLDsystem_design~15 mins

Read replicas in HLD - Deep Dive

Choose your learning style9 modes available
Overview - Read replicas
What is it?
Read replicas are copies of a primary database that handle read-only queries. They help spread out the load by letting many users read data without slowing down the main database. These replicas stay updated by copying changes from the primary database. This setup improves performance and availability for applications that read data often.
Why it matters
Without read replicas, all users would query the main database, causing slow responses and possible crashes during high traffic. Read replicas solve this by sharing the reading work, making apps faster and more reliable. This is crucial for websites and services with many users who mostly read data, like social media or online stores.
Where it fits
Before learning about read replicas, you should understand basic database concepts like primary databases and replication. After this, you can explore advanced topics like load balancing, caching, and multi-region database setups.
Mental Model
Core Idea
Read replicas are copies of a main database that handle read requests to reduce load and improve speed without affecting writes.
Think of it like...
Imagine a popular library with one main librarian who handles all book requests. To avoid long waits, the library makes several copies of popular books and places them on shelves around the building. Visitors can read these copies anytime without bothering the librarian, who focuses on adding new books and managing the collection.
Primary Database (Write + Read)
       │
       ├──> Read Replica 1 (Read-only)
       ├──> Read Replica 2 (Read-only)
       └──> Read Replica 3 (Read-only)

All write operations go to the Primary Database.
Read operations are distributed among Read Replicas.
Build-Up - 7 Steps
1
FoundationUnderstanding Primary Database Role
🤔
Concept: Learn what a primary database does and why it handles both reads and writes.
A primary database stores all the data and processes both read and write requests. It ensures data is accurate and consistent. However, when many users read data, the primary can become slow because it handles all requests.
Result
You understand that the primary database is the main source of truth but can become a bottleneck under heavy read load.
Knowing the primary database's dual role helps you see why separating reads can improve performance.
2
FoundationBasics of Database Replication
🤔
Concept: Introduce replication as copying data from one database to another to keep them in sync.
Replication means the primary database sends changes to other databases called replicas. These replicas get updated copies of data but usually do not accept writes. This keeps data consistent across systems.
Result
You grasp how data can be copied safely to other databases to share the load.
Understanding replication is key to knowing how read replicas stay updated without manual copying.
3
IntermediateHow Read Replicas Handle Read Traffic
🤔Before reading on: do you think read replicas can handle write requests? Commit to yes or no.
Concept: Read replicas only process read requests, offloading this work from the primary database.
Applications send read queries to replicas and write queries to the primary. This separation means the primary focuses on writes and critical tasks, while replicas serve many readers quickly.
Result
You see how read replicas improve performance by sharing read traffic.
Knowing that read replicas are read-only prevents confusion about data consistency and write conflicts.
4
IntermediateData Consistency and Replication Lag
🤔Before reading on: do you think read replicas always have the exact same data as the primary? Commit to yes or no.
Concept: Understand that replicas may be slightly behind the primary due to replication delay.
Changes made on the primary take time to reach replicas. This delay is called replication lag. During lag, replicas might show older data, which can affect applications needing the latest information.
Result
You learn that read replicas improve speed but may not always have the newest data instantly.
Recognizing replication lag helps design systems that tolerate slight delays or choose when to read from primary vs replicas.
5
IntermediateScaling with Multiple Read Replicas
🤔Before reading on: do you think adding more read replicas always improves performance linearly? Commit to yes or no.
Concept: Learn how adding replicas can increase read capacity but has limits.
More replicas mean more read queries can be handled simultaneously. However, each replica adds overhead for replication and maintenance. Also, network and application logic must route reads properly to replicas.
Result
You understand that scaling reads with replicas improves performance but requires careful management.
Knowing the tradeoffs of adding replicas prevents overcomplicating systems without real benefit.
6
AdvancedRead Replica Failover and High Availability
🤔Before reading on: do you think read replicas can replace the primary database if it fails? Commit to yes or no.
Concept: Explore how replicas can help keep systems running if the primary fails, but with limitations.
In some setups, a read replica can be promoted to primary if the original primary fails. This helps keep the system available. However, promotion takes time and may cause temporary data loss if replicas lag.
Result
You see how read replicas contribute to system resilience but are not full replacements without extra steps.
Understanding failover helps design systems that balance availability and data safety.
7
ExpertAdvanced Replication Techniques and Tradeoffs
🤔Before reading on: do you think synchronous replication is always better than asynchronous? Commit to yes or no.
Concept: Dive into replication modes and their impact on performance and consistency.
Synchronous replication waits for replicas to confirm data before completing writes, ensuring no data loss but slowing writes. Asynchronous replication lets writes complete immediately, improving speed but risking data loss if failure occurs. Choosing between them depends on application needs.
Result
You understand the deep tradeoffs between speed, consistency, and safety in replication.
Knowing replication modes lets you tailor read replica setups to real-world requirements and risks.
Under the Hood
Read replicas work by copying the write operations from the primary database through a replication process. This can be done by streaming changes (like logs of updates) or by periodic snapshots. The replicas apply these changes to their own data stores, keeping them nearly in sync. The replication can be asynchronous, where the primary does not wait for replicas to confirm, or synchronous, where it does. Applications route read queries to replicas using load balancers or client logic, while writes always go to the primary.
Why designed this way?
Read replicas were designed to solve the problem of scaling database reads without overloading the primary. Early databases handled all reads and writes on one server, which limited performance. Replication allowed distributing read load cheaply and simply. Asynchronous replication was chosen to maximize write speed, accepting some delay in data freshness. Synchronous replication exists for cases needing strict consistency but at a performance cost.
┌─────────────────────┐        ┌─────────────────────┐
│   Primary Database   │───────▶│   Read Replica 1    │
│  (Writes + Reads)   │        │   (Read-only)       │
└─────────────────────┘        └─────────────────────┘
          │                            ▲
          │                            │
          │                            │
          ▼                            │
┌─────────────────────┐              │
│   Read Replica 2    │◀─────────────┘
│   (Read-only)       │
└─────────────────────┘

Replication stream flows from Primary to Replicas.
Reads are distributed to replicas.
Writes go only to Primary.
Myth Busters - 4 Common Misconceptions
Quick: Do read replicas handle write requests? Commit to yes or no.
Common Belief:Read replicas can handle both read and write requests just like the primary.
Tap to reveal reality
Reality:Read replicas are read-only and do not accept write operations to avoid conflicts and maintain data integrity.
Why it matters:Trying to write to replicas can cause errors and data inconsistency, breaking the application.
Quick: Are read replicas always perfectly up-to-date with the primary? Commit to yes or no.
Common Belief:Read replicas always have the exact same data as the primary at all times.
Tap to reveal reality
Reality:There is usually a delay called replication lag, so replicas may show slightly older data.
Why it matters:Assuming perfect freshness can cause bugs in applications that need the latest data, like financial transactions.
Quick: Does adding more read replicas always improve performance linearly? Commit to yes or no.
Common Belief:More read replicas always mean proportionally better read performance.
Tap to reveal reality
Reality:Adding replicas improves capacity but with diminishing returns due to replication overhead and network limits.
Why it matters:Over-provisioning replicas wastes resources and complicates system management without real gains.
Quick: Can a read replica instantly replace the primary if it fails? Commit to yes or no.
Common Belief:Read replicas can immediately take over as primary without any delay or data loss.
Tap to reveal reality
Reality:Failover requires promotion steps and may cause temporary downtime or data loss if replicas lag.
Why it matters:Expecting instant failover can lead to poor disaster recovery planning and unexpected outages.
Expert Zone
1
Some applications use read-after-write consistency by directing recent writes to the primary and older reads to replicas, balancing freshness and performance.
2
Network topology and geographic distance affect replication lag; placing replicas closer to users can improve read latency but complicate synchronization.
3
Monitoring replication lag and automating failover decisions are critical in production to avoid stale reads and downtime.
When NOT to use
Read replicas are not suitable when applications require strict, immediate consistency for all reads and writes. In such cases, consider single primary with strong consistency or distributed databases with consensus protocols like Paxos or Raft.
Production Patterns
In production, read replicas are combined with load balancers or proxy layers that route read queries intelligently. Systems often use multiple replicas across regions for disaster recovery and low latency. Monitoring tools track replication health and lag to trigger alerts or automated failover.
Connections
Caching
Both caching and read replicas reduce load on primary data sources by serving repeated reads from faster or distributed stores.
Understanding read replicas helps grasp caching strategies since both aim to improve read performance but differ in data freshness and complexity.
Content Delivery Networks (CDNs)
CDNs and read replicas both replicate data closer to users to reduce latency and improve availability.
Knowing how read replicas work clarifies CDN design, as both balance freshness, consistency, and performance tradeoffs.
Supply Chain Management
Read replicas resemble inventory distribution centers that hold copies of products to serve customers faster without waiting for the main warehouse.
This cross-domain link shows how distributing copies strategically improves service speed and reliability in both tech and logistics.
Common Pitfalls
#1Sending write queries to read replicas causing errors.
Wrong approach:INSERT INTO users (name) VALUES ('Alice'); -- sent to read replica
Correct approach:INSERT INTO users (name) VALUES ('Alice'); -- sent to primary database
Root cause:Misunderstanding that read replicas are read-only and cannot process writes.
#2Ignoring replication lag and reading stale data from replicas.
Wrong approach:SELECT balance FROM accounts; -- always from read replica without checking freshness
Correct approach:SELECT balance FROM accounts; -- from primary or replica with lag monitoring
Root cause:Assuming replicas are always perfectly up-to-date leads to incorrect application behavior.
#3Adding too many read replicas without managing replication overhead.
Wrong approach:Deploying 20 replicas for a small app expecting linear performance gains.
Correct approach:Deploying a balanced number of replicas based on load and monitoring replication health.
Root cause:Believing more replicas always equal better performance without considering system limits.
Key Takeaways
Read replicas copy data from a primary database to handle read requests and reduce load on the main system.
They improve performance and availability but may show slightly outdated data due to replication lag.
Writes always go to the primary database to maintain data consistency and avoid conflicts.
Choosing replication modes and the number of replicas involves tradeoffs between speed, consistency, and resource use.
Proper monitoring and routing logic are essential to use read replicas effectively in production.