0
0
HLDsystem_design~7 mins

Data replication strategies in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
When a single database instance handles all read and write requests, it becomes a bottleneck causing slow response times and risking data loss if the server fails. Without replication, data availability and fault tolerance are severely limited, leading to downtime and poor user experience.
Solution
Data replication copies data across multiple servers to distribute load and increase availability. Writes are propagated to replicas either synchronously or asynchronously, allowing reads to be served from multiple nodes and providing fault tolerance if one node fails.
Architecture
Primary DB
Primary DB
Replica 1

This diagram shows a primary database replicating data to two replicas. Clients send writes to the primary and reads to replicas to balance load and improve availability.

Trade-offs
✓ Pros
Improves read scalability by distributing read traffic across replicas.
Increases fault tolerance by providing data copies if the primary fails.
Reduces latency for geographically distributed users by placing replicas closer to them.
✗ Cons
Synchronous replication can increase write latency due to waiting for replicas to confirm.
Asynchronous replication risks data loss if the primary fails before replicas catch up.
Complexity in handling data consistency and conflict resolution increases.
Use when read traffic is significantly higher than write traffic (e.g., 10x reads), or when high availability and disaster recovery are required for critical data.
Avoid when write latency must be minimal and strict consistency is required, or when system scale is small (under 1000 writes per second) where replication overhead outweighs benefits.
Real World Examples
Amazon
Uses asynchronous replication across multiple regions to ensure data durability and low-latency reads for global customers.
Netflix
Employs multi-region replication to serve streaming metadata with low latency and high availability.
Google
Uses synchronous replication in Spanner to provide strong consistency across distributed nodes.
Alternatives
Sharding
Splits data horizontally across multiple databases instead of copying full data sets.
Use when: Use when write throughput is very high and data can be partitioned by key.
Caching
Stores frequently accessed data temporarily in fast storage instead of replicating full data sets.
Use when: Use when read latency is critical and data freshness can be relaxed.
Summary
Data replication copies data across multiple servers to improve availability and read scalability.
It balances trade-offs between consistency, latency, and fault tolerance depending on replication type.
Choosing the right replication strategy depends on system scale, read/write ratio, and consistency needs.