HLDsystem_design~7 mins

Data replication strategies in HLD - System Design Guide

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Problem Statement

When a single database instance handles all read and write requests, it becomes a bottleneck causing slow response times and risking data loss if the server fails. Without replication, data availability and fault tolerance are severely limited, leading to downtime and poor user experience.

Solution

Data replication copies data across multiple servers to distribute load and increase availability. Writes are propagated to replicas either synchronously or asynchronously, allowing reads to be served from multiple nodes and providing fault tolerance if one node fails.

Architecture

Primary DB

↓

Replica 1

This diagram shows a primary database replicating data to two replicas. Clients send writes to the primary and reads to replicas to balance load and improve availability.

Trade-offs

✓ Pros

→

Improves read scalability by distributing read traffic across replicas.

→

Increases fault tolerance by providing data copies if the primary fails.

→

Reduces latency for geographically distributed users by placing replicas closer to them.

✗ Cons

→

Synchronous replication can increase write latency due to waiting for replicas to confirm.

→

Asynchronous replication risks data loss if the primary fails before replicas catch up.

→

Complexity in handling data consistency and conflict resolution increases.

Use when read traffic is significantly higher than write traffic (e.g., 10x reads), or when high availability and disaster recovery are required for critical data.

Avoid when write latency must be minimal and strict consistency is required, or when system scale is small (under 1000 writes per second) where replication overhead outweighs benefits.

Real World Examples

Amazon

Uses asynchronous replication across multiple regions to ensure data durability and low-latency reads for global customers.

Netflix

Employs multi-region replication to serve streaming metadata with low latency and high availability.

Google

Uses synchronous replication in Spanner to provide strong consistency across distributed nodes.

Alternatives

Sharding

Splits data horizontally across multiple databases instead of copying full data sets.

Use when: Use when write throughput is very high and data can be partitioned by key.

Caching

Stores frequently accessed data temporarily in fast storage instead of replicating full data sets.

Use when: Use when read latency is critical and data freshness can be relaxed.

Summary

Data replication copies data across multiple servers to improve availability and read scalability.

It balances trade-offs between consistency, latency, and fault tolerance depending on replication type.

Choosing the right replication strategy depends on system scale, read/write ratio, and consistency needs.