0
0
HLDsystem_design~15 mins

Storage access patterns in HLD - Deep Dive

Choose your learning style9 modes available
Overview - Storage access patterns
What is it?
Storage access patterns describe the common ways systems read and write data to storage. They explain how data is organized, retrieved, and updated efficiently. These patterns help design systems that handle data smoothly and quickly. Understanding them helps avoid slow or costly data operations.
Why it matters
Without good storage access patterns, systems become slow, unreliable, or expensive to run. Imagine a library where books are scattered randomly versus one where books are organized by topic and author. The right pattern saves time and resources, making applications faster and more scalable. Poor patterns cause delays, data loss, or high costs.
Where it fits
Learners should know basic storage types like databases and file systems before this. After this, they can study caching, indexing, and distributed storage systems. This topic connects foundational storage knowledge to advanced system design choices.
Mental Model
Core Idea
Storage access patterns are the common, repeatable ways systems organize and retrieve data to balance speed, cost, and complexity.
Think of it like...
It's like organizing your kitchen: you can store items by type, frequency of use, or meal preparation steps. How you arrange things affects how fast you can cook and clean.
┌───────────────────────────────┐
│       Storage Access Patterns  │
├───────────────┬───────────────┤
│ Sequential    │ Random        │
│ Access        │ Access        │
├───────────────┼───────────────┤
│ Read/Write    │ Read/Write    │
│ in order      │ at any point  │
├───────────────┼───────────────┤
│ Batch jobs,   │ Databases,    │
│ Logs          │ Index lookups │
├───────────────┴───────────────┤
│ Other patterns:                  │
│ - Temporal locality             │
│ - Spatial locality              │
│ - Append-only                  │
│ - Random writes                │
└───────────────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding Sequential Access
🤔
Concept: Sequential access means reading or writing data in a continuous, ordered way.
Imagine reading a book page by page from start to finish. In storage, sequential access reads or writes data blocks one after another. This is efficient because the storage device can predict where the next data is and prepare to access it quickly. Examples include log files or streaming video data.
Result
Operations are fast and use less device movement or overhead.
Understanding sequential access helps design systems that maximize throughput by minimizing random jumps in storage.
2
FoundationUnderstanding Random Access
🤔
Concept: Random access means reading or writing data at any position without following order.
Think of looking up a word in a dictionary. You jump directly to the page you want. In storage, random access allows fetching or updating any data block quickly without reading everything before it. Databases use random access to find records fast.
Result
Systems can retrieve specific data quickly but may have more overhead than sequential access.
Knowing random access is key to supporting flexible queries and updates in systems.
3
IntermediateTemporal Locality Pattern
🤔Before reading on: do you think temporal locality means data accessed recently is likely to be accessed again soon, or not? Commit to your answer.
Concept: Temporal locality means recently accessed data is likely to be accessed again soon.
If you check your email inbox now, you might check it again shortly after. Systems use this pattern to keep recent data in fast storage like cache. This reduces slow access to main storage.
Result
Systems improve speed by caching hot data based on recent use.
Understanding temporal locality helps optimize caching strategies and reduce latency.
4
IntermediateSpatial Locality Pattern
🤔Before reading on: does spatial locality mean data near recently accessed data is likely to be accessed soon, or not? Commit to your answer.
Concept: Spatial locality means data near recently accessed data is likely to be accessed soon.
If you read one page of a book, you often read the next pages soon. Storage systems prefetch nearby data blocks anticipating future requests. This speeds up sequential reads and some random reads.
Result
Prefetching reduces wait times for nearby data access.
Knowing spatial locality guides how systems organize and prefetch data to improve performance.
5
IntermediateAppend-Only Access Pattern
🤔Before reading on: do you think append-only means data is only added at the end, or can be updated anywhere? Commit to your answer.
Concept: Append-only means data is only added at the end, never overwritten or deleted in place.
Imagine writing a diary where you only add new pages at the end. Systems like logs or event stores use append-only to simplify writes and improve durability. Old data stays unchanged, making recovery easier.
Result
Write operations are fast and safe, but storage grows continuously.
Understanding append-only helps design systems that are reliable and easy to recover.
6
AdvancedCombining Patterns for Scalability
🤔Before reading on: do you think combining access patterns complicates system design or simplifies it? Commit to your answer.
Concept: Real systems combine multiple access patterns to balance speed, cost, and complexity.
For example, a database may use random access for queries, append-only for transaction logs, and caching for temporal locality. Combining patterns requires careful design to avoid conflicts and optimize resources.
Result
Systems achieve high performance and reliability at scale.
Knowing how to combine patterns is essential for building scalable, real-world systems.
7
ExpertSurprises in Storage Access Patterns
🤔Before reading on: do you think sequential access is always faster than random access? Commit to your answer.
Concept: Some storage technologies blur the line between sequential and random access speeds.
For example, solid-state drives (SSDs) have near-equal speed for random and sequential reads, unlike hard drives. This changes how systems optimize access patterns. Also, write amplification in append-only systems can cause hidden costs.
Result
Designers must rethink traditional assumptions about access patterns.
Understanding hardware nuances prevents outdated designs and unlocks better performance.
Under the Hood
Storage devices organize data in blocks or pages. Sequential access reads or writes these blocks in order, minimizing mechanical movement or controller overhead. Random access jumps to specific blocks, requiring address translation and sometimes more latency. Caches store copies of data based on recent or nearby access to speed up reads. Append-only systems write new data at the end, avoiding costly overwrites and enabling simple recovery by replaying logs.
Why designed this way?
These patterns evolved to match hardware capabilities and application needs. Early hard drives were slow at random access due to mechanical parts, so sequential access was favored. Caching emerged to hide slow storage latency. Append-only was designed for durability and simplicity in logging. As hardware evolved, patterns adapted but kept core principles to balance speed, cost, and complexity.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Application   │──────▶│ Cache Layer   │──────▶│ Storage Device│
└───────────────┘       └───────────────┘       └───────────────┘
       ▲                      ▲                      ▲
       │                      │                      │
       │                      │                      │
       │                      │                      │
       └──────────────────────┴──────────────────────┘

Cache uses temporal and spatial locality to serve requests fast.
Storage device handles sequential and random access differently.
Myth Busters - 4 Common Misconceptions
Quick: Is sequential access always faster than random access? Commit yes or no.
Common Belief:Sequential access is always faster than random access.
Tap to reveal reality
Reality:With modern SSDs, random and sequential access speeds are often similar.
Why it matters:Assuming sequential is always faster can lead to poor design choices that don't leverage SSD strengths.
Quick: Does append-only mean data cannot be deleted? Commit yes or no.
Common Belief:Append-only means data is never deleted or changed.
Tap to reveal reality
Reality:Append-only means new data is added at the end, but old data can be logically deleted or compacted later.
Why it matters:Misunderstanding this can cause confusion about storage growth and maintenance strategies.
Quick: Does caching always improve performance? Commit yes or no.
Common Belief:Caching always speeds up data access.
Tap to reveal reality
Reality:Caching helps only if access patterns have locality; random, one-time accesses may not benefit.
Why it matters:Blindly adding cache can waste resources without improving performance.
Quick: Is random access always more expensive than sequential? Commit yes or no.
Common Belief:Random access always costs more time and resources than sequential access.
Tap to reveal reality
Reality:In some systems, random access is optimized and can be as cheap as sequential, especially in memory or SSDs.
Why it matters:Overestimating random access cost can limit system flexibility and design options.
Expert Zone
1
Some systems use hybrid storage devices that blur access pattern boundaries, requiring adaptive strategies.
2
Write amplification in append-only systems can degrade performance and storage life if not managed carefully.
3
Temporal and spatial locality patterns can conflict, forcing trade-offs in cache design and prefetching.
When NOT to use
Avoid append-only patterns when data must be frequently updated or deleted in place; use in-place update or versioned storage instead. Sequential access patterns are less effective on SSDs with uniform random access speed; consider random access optimizations. Caching is ineffective for purely random, one-time data access workloads; consider direct access.
Production Patterns
Real-world systems combine append-only logs for durability, random access for queries, and caching for hot data. Distributed databases use partitioning to optimize access patterns per shard. Content delivery networks use spatial locality to prefetch related content. Event sourcing systems rely heavily on append-only patterns for auditability.
Connections
Caching
builds-on
Understanding storage access patterns clarifies why caching works best with temporal and spatial locality.
Database Indexing
same pattern
Indexing structures optimize random access patterns to speed up data retrieval.
Human Memory Recall
analogy in cognitive science
Temporal and spatial locality in storage access patterns mirror how humans recall recent and related memories efficiently.
Common Pitfalls
#1Assuming sequential access is always best for performance.
Wrong approach:Designing a system that forces all data reads to be sequential even on SSDs, ignoring random access benefits.
Correct approach:Designing access patterns that leverage SSD strengths by allowing efficient random reads and writes.
Root cause:Outdated assumptions based on hard drive characteristics.
#2Using append-only pattern without cleanup.
Wrong approach:Continuously appending data without compaction or deletion, causing unlimited storage growth.
Correct approach:Implementing periodic compaction or garbage collection to reclaim space.
Root cause:Misunderstanding append-only as permanent data storage without maintenance.
#3Caching data without considering access patterns.
Wrong approach:Caching all data indiscriminately regardless of usage frequency or locality.
Correct approach:Caching only hot data identified by temporal and spatial locality patterns.
Root cause:Ignoring the importance of access patterns in cache effectiveness.
Key Takeaways
Storage access patterns guide how data is read and written to optimize speed and cost.
Sequential and random access are fundamental patterns with different trade-offs depending on hardware.
Temporal and spatial locality explain why caching and prefetching improve performance.
Append-only patterns simplify writes and recovery but require maintenance to control storage growth.
Modern hardware changes traditional assumptions, so designs must adapt to current storage technologies.