0
0
HLDsystem_design~7 mins

Write-through and write-back caching in HLD - System Design Guide

Choose your learning style9 modes available
Problem Statement
When a system writes data directly to the main storage every time, it slows down due to frequent slow disk operations. Conversely, if writes are delayed or only stored in cache, data loss or inconsistency can occur during failures or crashes.
Solution
Write-through caching immediately writes data to both the cache and the main storage, ensuring consistency but with some latency. Write-back caching writes data only to the cache first and updates the main storage later, improving write performance but requiring mechanisms to handle data loss and consistency.
Architecture
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Application │──────▶│     Cache     │──────▶│ Main Storage  │
└───────────────┘       └───────────────┘       └───────────────┘
          │                     │                      │
          │                     │                      │
          │                     │                      │
          │                     │                      │
          │                     │                      │
          ▼                     ▼                      ▼
Write-through: Write to Cache and Main Storage synchronously
Write-back: Write to Cache first, then asynchronously to Main Storage

This diagram shows the data flow for write-through and write-back caching. The application writes data to the cache, which either immediately writes to main storage (write-through) or delays writing to main storage (write-back).

Trade-offs
✓ Pros
Write-through ensures strong data consistency between cache and storage.
Write-back improves write performance by reducing main storage writes.
Write-back reduces wear on storage devices by batching writes.
Write-through simplifies recovery since data is always in storage.
✗ Cons
Write-through has higher write latency due to synchronous storage writes.
Write-back risks data loss if cache is lost before flushing to storage.
Write-back requires complex mechanisms for cache eviction and consistency.
Use write-through caching when data consistency and durability are critical and write latency is acceptable, typically in systems with moderate write load. Use write-back caching when write performance is a priority and the system can tolerate eventual consistency, especially in high write throughput scenarios.
Avoid write-back caching in systems where data loss cannot be tolerated, such as financial transactions or critical logs. Avoid write-through caching when write latency severely impacts user experience or system throughput.
Real World Examples
Amazon
Amazon DynamoDB uses write-back caching in its DAX (DynamoDB Accelerator) to improve write throughput while maintaining eventual consistency.
Netflix
Netflix uses write-through caching in its edge caches to ensure that user session data is always consistent with the backend storage.
Google
Google's Bigtable employs write-back caching to batch writes for performance, with mechanisms to ensure data durability.
Alternatives
Write-around caching
Writes bypass the cache and go directly to main storage, only reads use the cache.
Use when: Use when write operations are rare or when caching writes would cause cache pollution.
Read-through caching
Cache automatically loads data from storage on cache misses during reads, writes may be handled differently.
Use when: Use when read latency is critical and cache misses should be handled transparently.
Summary
Write-through caching writes data synchronously to both cache and storage to ensure consistency.
Write-back caching writes data to cache first and updates storage later to improve performance.
Choosing between them depends on the system's tolerance for latency, consistency, and data loss.