HLDsystem_design~10 mins

NoSQL database types (document, key-value, column, graph) in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - NoSQL database types (document, key-value, column, graph)

Growth Table: NoSQL Database Types Scaling

Users / Data Size	Key-Value Store	Document Store	Column Store	Graph Database
100 users	Single server, in-memory caching	Single server, simple queries	Single server, small tables	Single server, small graph
10K users	Sharding by key, cache layer	Replica sets, indexing	Partitioning columns, compression	Indexing nodes, caching paths
1M users	Multiple shards, distributed cache	Horizontal scaling, query optimization	Distributed storage, column pruning	Graph partitioning, query parallelism
100M users	Global sharding, multi-region clusters	Multi-region replication, CDN for static data	Massive parallel processing, tiered storage	Advanced graph partitioning, caching hot subgraphs

First Bottleneck

At small scale, the database server CPU and memory limits are the first bottleneck for all NoSQL types because they handle all queries and data in one place.

As users grow to 10K-1M, network bandwidth and disk I/O become bottlenecks due to increased data movement and query complexity.

For graph databases, complex traversals cause CPU bottlenecks earlier than others because graph queries are compute-intensive.

Scaling Solutions

Key-Value Stores: Use sharding by key, add distributed caching (e.g., Redis), and replicate data for availability.
Document Stores: Implement replica sets for fault tolerance, add indexes on query fields, and shard collections by document attributes.
Column Stores: Partition data by columns, compress data to save space, and use distributed storage systems.
Graph Databases: Partition graphs into subgraphs, cache frequently accessed paths, and parallelize graph queries.
Across all types, use load balancers to distribute traffic and CDNs to serve static content.

Back-of-Envelope Cost Analysis

At 1M users, assuming 1 request per second per user, total requests = 1 million QPS.
Single server handles ~5000 QPS, so need ~200 servers for database layer.
Storage: If average document/record size is 1 KB, 1M users with 100 records each = 100 GB data.
Network bandwidth: 1M QPS * 1 KB = ~1 GB/s (8 Gbps), requiring multiple network interfaces and data centers.

Interview Tip

Start by explaining the NoSQL type and its data model. Then discuss expected load and data size. Identify the first bottleneck logically (usually database CPU or disk). Propose scaling solutions matching the bottleneck. Mention trade-offs like consistency vs availability. Use real numbers to show understanding.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database. If write load grows, shard data horizontally to distribute writes.

Key Result

NoSQL databases scale by sharding and replication; the first bottleneck is usually database CPU and disk I/O, solved by horizontal scaling and caching.