Overview - Read replicas

What is it?

Read replicas are copies of a primary database that handle read-only queries. They help spread out the work so the main database doesn't get too busy. This means faster responses for users when they ask for data. They keep updating themselves to match the main database.

Why it matters

Without read replicas, all requests would go to one database, causing slowdowns and possible crashes when many users access it. Read replicas let many users get data quickly without waiting. This improves user experience and keeps apps running smoothly even when busy.

Where it fits

Before learning about read replicas, you should understand basic databases and how data is stored and retrieved. After this, you can learn about database scaling, caching, and high availability to make systems even stronger.

Mental Model

Core Idea

Read replicas are like extra copies of a book that many people can read at the same time without waiting for the original.

Think of it like...

Imagine a popular library book that many people want to read. Instead of everyone waiting for the single original copy, the library makes several photocopies. Readers can pick any copy to read, so no one waits long. The library keeps updating the copies when the original changes.

Primary Database
   │
   ├── Read Replica 1 (handles read requests)
   ├── Read Replica 2 (handles read requests)
   └── Read Replica 3 (handles read requests)

Updates flow from Primary to Replicas to keep data fresh.

Build-Up - 7 Steps

1

FoundationWhat is a database replica?

Concept: A replica is a copy of a database that holds the same data as the original.

A database stores information. A replica is just another database that copies this information. It doesn't change data on its own but follows the original to stay updated.

Result

You get a second database with the same data as the first.

Understanding replicas is key because they form the base for read replicas and help spread out data access.

2

FoundationDifference between read and write operations

3

IntermediateHow read replicas improve performance

4

IntermediateData synchronization between primary and replicas

5

IntermediateRead replicas in Google Cloud SQL

6

AdvancedHandling replication lag and consistency

7

ExpertScaling read replicas and failover strategies

Under the Hood

Read replicas work by copying the write operations from the primary database to themselves. This is done through a process called replication, where the primary logs changes and sends them to replicas. Replicas apply these changes to stay updated. This process is asynchronous, meaning replicas may lag behind the primary. The system routes read queries to replicas and write queries to the primary.

Why designed this way?

This design separates read and write workloads to improve performance and availability. Synchronous replication would slow down writes, so asynchronous replication balances speed and data freshness. Alternatives like sharding split data but add complexity. Replication is simpler for scaling reads.

┌───────────────┐       replication logs       ┌───────────────┐
│ Primary DB    │─────────────────────────────▶│ Read Replica 1│
│ (writes +    │                              └───────────────┘
│  reads)      │
│               │       replication logs       ┌───────────────┐
│               │─────────────────────────────▶│ Read Replica 2│
└───────────────┘                              └───────────────┘

Client queries:
  Writes ──▶ Primary DB
  Reads ──▶ Read Replicas

Myth Busters - 4 Common Misconceptions

Quick: Do read replicas handle write requests? Commit to yes or no.

Common Belief:Read replicas can handle both reads and writes just like the primary database.

Tap to reveal reality

Quick: Do read replicas always have the latest data instantly? Commit to yes or no.

Common Belief:Read replicas always show the most up-to-date data immediately after a write.

Tap to reveal reality

Quick: Does adding more read replicas always make the system infinitely faster? Commit to yes or no.

Common Belief:Adding many read replicas will always improve read performance linearly.

Tap to reveal reality

Quick: Can a read replica replace the primary database automatically if it fails? Commit to yes or no.

Common Belief:Read replicas automatically become primary if the main database fails without extra setup.

Tap to reveal reality

Expert Zone

1

Replication lag varies with workload and network; monitoring it is essential for data freshness.

2

Read replicas can be used for backup and analytics workloads to reduce load on the primary.

3

Some cloud providers offer read replicas with different consistency models; choosing the right one affects app behavior.

When NOT to use

Read replicas are not suitable when applications require strong consistency for all reads immediately after writes. In such cases, consider synchronous replication or single primary scaling. Also, for write-heavy workloads, sharding or partitioning may be better alternatives.

Production Patterns

In production, read replicas are often combined with load balancers that route read queries automatically. Applications may use read replicas for reporting and analytics to avoid slowing down transactional workloads. Failover automation scripts promote replicas to primary during outages to maintain availability.

Connections

Caching

Builds-on

Both caching and read replicas reduce load on the primary database by serving repeated read requests faster, but caching stores data temporarily while replicas hold full database copies.

Eventual consistency

Shares principles

Read replicas demonstrate eventual consistency because they update after the primary, showing how systems can tolerate slight delays in data synchronization.

Supply chain inventory management

Analogous process

Just like warehouses keep stock updated with some delay from the main factory, read replicas keep data updated with some lag, balancing availability and freshness.

Common Pitfalls

#1Sending write queries to read replicas causes errors.

Wrong approach:INSERT INTO read_replica_table VALUES ('data');

Correct approach:INSERT INTO primary_database_table VALUES ('data');

Root cause:Misunderstanding that replicas can handle writes leads to sending write commands to read-only replicas.

#2Assuming replicas always have the latest data and using them for critical reads.

Wrong approach:SELECT * FROM read_replica_table WHERE immediate_freshness_needed = TRUE;

Correct approach:SELECT * FROM primary_database_table WHERE immediate_freshness_needed = TRUE;

Root cause:Ignoring replication lag causes reading stale data when fresh data is required.

#3Creating too many read replicas without monitoring replication lag.

Wrong approach:Deploy 20 read replicas without checking synchronization status.

Correct approach:Deploy a reasonable number of read replicas and monitor replication lag to balance performance and freshness.

Root cause:Believing more replicas always improve performance leads to degraded system behavior.

Key Takeaways

Read replicas are copies of a primary database that handle only read requests to improve performance.

They update asynchronously, so replicas may lag behind the primary database slightly.

Using read replicas reduces load on the primary, making applications faster and more reliable.

Replication lag and failover require careful handling to maintain data accuracy and availability.

Cloud providers like Google Cloud SQL offer managed read replicas to simplify scaling and maintenance.