0
0
HLDsystem_design~10 mins

NoSQL database types (document, key-value, column, graph) in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - NoSQL database types (document, key-value, column, graph)
Growth Table: NoSQL Database Types Scaling
Users / Data SizeKey-Value StoreDocument StoreColumn StoreGraph Database
100 usersSingle server, in-memory cachingSingle server, simple queriesSingle server, small tablesSingle server, small graph
10K usersSharding by key, cache layerReplica sets, indexingPartitioning columns, compressionIndexing nodes, caching paths
1M usersMultiple shards, distributed cacheHorizontal scaling, query optimizationDistributed storage, column pruningGraph partitioning, query parallelism
100M usersGlobal sharding, multi-region clustersMulti-region replication, CDN for static dataMassive parallel processing, tiered storageAdvanced graph partitioning, caching hot subgraphs
First Bottleneck

At small scale, the database server CPU and memory limits are the first bottleneck for all NoSQL types because they handle all queries and data in one place.

As users grow to 10K-1M, network bandwidth and disk I/O become bottlenecks due to increased data movement and query complexity.

For graph databases, complex traversals cause CPU bottlenecks earlier than others because graph queries are compute-intensive.

Scaling Solutions
  • Key-Value Stores: Use sharding by key, add distributed caching (e.g., Redis), and replicate data for availability.
  • Document Stores: Implement replica sets for fault tolerance, add indexes on query fields, and shard collections by document attributes.
  • Column Stores: Partition data by columns, compress data to save space, and use distributed storage systems.
  • Graph Databases: Partition graphs into subgraphs, cache frequently accessed paths, and parallelize graph queries.
  • Across all types, use load balancers to distribute traffic and CDNs to serve static content.
Back-of-Envelope Cost Analysis
  • At 1M users, assuming 1 request per second per user, total requests = 1 million QPS.
  • Single server handles ~5000 QPS, so need ~200 servers for database layer.
  • Storage: If average document/record size is 1 KB, 1M users with 100 records each = 100 GB data.
  • Network bandwidth: 1M QPS * 1 KB = ~1 GB/s (8 Gbps), requiring multiple network interfaces and data centers.
Interview Tip

Start by explaining the NoSQL type and its data model. Then discuss expected load and data size. Identify the first bottleneck logically (usually database CPU or disk). Propose scaling solutions matching the bottleneck. Mention trade-offs like consistency vs availability. Use real numbers to show understanding.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce load on the primary database. If write load grows, shard data horizontally to distribute writes.

Key Result
NoSQL databases scale by sharding and replication; the first bottleneck is usually database CPU and disk I/O, solved by horizontal scaling and caching.