Overview - Key-value and document store model

What is it?

The key-value and document store model is a way to organize data where each item is stored as a pair of a unique key and its associated value. In key-value stores, the value is usually a simple piece of data, while document stores allow the value to be a complex, structured document like JSON. This model helps store and retrieve data quickly by using the key as a direct address to the data.

Why it matters

This model exists because many applications need fast access to data without complex relationships. Without it, retrieving data would be slower and more complicated, especially for flexible or changing data structures. It allows developers to build scalable, high-performance applications like shopping carts, user profiles, or session stores that respond instantly to user actions.

Where it fits

Before learning this, you should understand basic database concepts like tables and rows. After this, you can explore more complex NoSQL models like graph databases or relational databases with joins. This model is a foundation for understanding how modern cloud databases like DynamoDB work.

Mental Model

Core Idea

Data is stored as unique keys pointing directly to values or documents, enabling fast and flexible retrieval without complex structure.

Think of it like...

It's like a library where each book has a unique call number (key), and you use that number to find the exact book (value or document) quickly without searching through shelves.

┌───────────────┐
│   Key-Value   │
├───────────────┤
│ Key: User123  │──▶ Value: {"name":"Anna", "age":30}
│ Key: Order456 │──▶ Value: {"items":["book","pen"], "total":25}
│ Key: Session1 │──▶ Value: "active"
└───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Key-Value Basics

Concept: Introduce the simplest form of data storage using unique keys and simple values.

In a key-value store, each piece of data is stored with a unique key. For example, a key could be 'user123' and the value could be 'Anna'. You retrieve data by asking for the key, and the store returns the value instantly.

Result

You can quickly get 'Anna' by asking for 'user123'.

Understanding that keys act like direct addresses helps grasp why this model is very fast for lookups.

2

FoundationIntroducing Document Stores

3

IntermediateHow Keys Ensure Fast Access

4

IntermediateDifferences Between Key-Value and Document Stores

5

IntermediateUsing DynamoDB for Key-Value and Document Storage

6

AdvancedHandling Data Consistency and Scaling

7

ExpertSurprising Limits and Internal Optimizations

Under the Hood

DynamoDB stores data in partitions distributed across multiple servers. Each item has a primary key that the system hashes to decide which partition holds it. This hashing allows direct access to the partition without scanning others. Data is stored on fast SSDs with replication for durability. Secondary indexes maintain additional mappings for querying non-key attributes. Consistency is managed by synchronizing replicas or allowing eventual updates.

Why designed this way?

This design balances speed, scalability, and reliability. Hashing keys to partitions avoids bottlenecks and allows horizontal scaling. Replication ensures data is safe even if servers fail. Secondary indexes provide query flexibility without slowing down writes. Alternatives like relational joins were avoided to keep performance high for simple lookups.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Client      │──────▶│ Partition 1   │──────▶│ SSD Storage   │
│ (Query by key)│       │ (Hash of key) │       │ (Replicated)  │
└───────────────┘       └───────────────┘       └───────────────┘
         │                      │                      ▲
         │                      │                      │
         │                      ▼                      │
         │               ┌───────────────┐            │
         │               │ Secondary     │────────────┘
         │               │ Indexes       │
         │               └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think key-value stores can perform complex queries like SQL joins? Commit yes or no.

Common Belief:Key-value stores can do complex queries like relational databases.

Tap to reveal reality

Quick: Do you think document stores always store data in a fixed schema? Commit yes or no.

Common Belief:Document stores require a fixed schema like relational tables.

Tap to reveal reality

Quick: Do you think DynamoDB automatically indexes all document fields? Commit yes or no.

Common Belief:DynamoDB indexes every attribute inside documents automatically.

Tap to reveal reality

Quick: Do you think eventual consistency means data is always outdated? Commit yes or no.

Common Belief:Eventual consistency means data is unreliable and always stale.

Tap to reveal reality

Expert Zone

1

Secondary indexes in DynamoDB have their own throughput limits separate from the main table, which can cause throttling if not managed.

2

DynamoDB's partitioning is based on hashed keys, so uneven key distribution can cause hot partitions and degrade performance.

3

Document size limits (400 KB per item) require careful design to avoid exceeding storage constraints in DynamoDB.

When NOT to use

Avoid key-value and document stores when your application requires complex transactions, multi-item joins, or strong relational integrity. In such cases, use relational databases like PostgreSQL or graph databases for connected data.

Production Patterns

In production, DynamoDB is used for session management, user profiles, shopping carts, and real-time leaderboards. Developers design keys to distribute load evenly and use secondary indexes for common queries. They also implement caching layers to reduce read costs and handle eventual consistency carefully.

Connections

Hash Tables

Key-value stores use hashing like hash tables to map keys to values efficiently.

Understanding hash tables from computer science helps grasp why key-value stores are so fast and how collisions are handled.

REST APIs

Document stores often use JSON documents, the same format used in REST API data exchange.

Knowing REST API data formats helps understand how document stores naturally fit modern web applications.

Library Catalog Systems

Like a library catalog uses unique call numbers to find books, key-value stores use keys to find data.

This connection shows how organizing data by unique identifiers is a universal pattern for quick retrieval.

Common Pitfalls

#1Trying to query data by non-key attributes without indexes.

Wrong approach:SELECT * FROM Users WHERE age = 30;

Correct approach:Create a secondary index on 'age' and query using that index.

Root cause:Misunderstanding that key-value stores require indexes to query non-key fields.

#2Storing very large documents exceeding size limits.

Wrong approach:PutItem with a JSON document of 1 MB size.

Correct approach:Split large documents into multiple items or use S3 for large blobs.

Root cause:Ignoring DynamoDB's item size limit leads to errors and data loss.

#3Using sequential keys causing hot partitions.

Wrong approach:Using timestamps as partition keys like '20240601', '20240602', ...

Correct approach:Use hashed or random keys to distribute load evenly.

Root cause:Not understanding partitioning causes uneven load and throttling.

Key Takeaways

Key-value and document stores organize data by unique keys for fast, direct access.

Document stores allow flexible, nested data structures unlike simple key-value pairs.

DynamoDB uses hashing and partitioning to scale and keep queries fast at any size.

Indexes are essential to query non-key attributes efficiently in document stores.

Understanding consistency models and partitioning is key to building reliable, scalable applications.