0
0
HLDsystem_design~10 mins

Database indexing in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Database indexing
Growth Table: Database Indexing at Different Scales
Users / Queries100 Users10,000 Users1 Million Users100 Million Users
Query Volume (QPS)~100 QPS~1,000 QPS~10,000 QPS~100,000 QPS
Data SizeMBs to low GBsGBs to tens of GBsHundreds of GBs to TBsMultiple TBs to PBs
Index SizeSmall, fits in memoryLarger, partial in memoryLarge, needs optimized storageVery large, requires partitioning
Query LatencyLow (ms)Low to moderateModerate to high if unoptimizedHigh without advanced indexing
Index Maintenance CostLowModerateHigh, impacts write speedVery high, requires tuning
First Bottleneck

As user count and query volume grow, the first bottleneck is usually the database query performance. Without proper indexing, queries become slow because the database scans large tables. Indexes speed up reads but add overhead on writes. At medium scale, index maintenance and memory usage become bottlenecks. At very large scale, index size and disk I/O limit performance.

Scaling Solutions for Database Indexing
  • Index Optimization: Use appropriate index types (B-tree, hash, composite) based on query patterns.
  • Partial and Covering Indexes: Index only needed columns to reduce size and improve speed.
  • Index Maintenance: Schedule index rebuilds and updates during low traffic to reduce write impact.
  • Horizontal Scaling: Shard data to distribute indexes across multiple database instances.
  • Caching: Use in-memory caches (Redis, Memcached) to reduce database hits for frequent queries.
  • Read Replicas: Offload read queries to replicas with their own indexes to reduce load on primary.
  • Partitioning: Split large tables and indexes by range or hash to improve query and maintenance performance.
Back-of-Envelope Cost Analysis

Assuming 10,000 QPS at medium scale:

  • Each query reads indexed data in ~1-5 ms.
  • Index size can be 10-30% of data size; for 1 TB data, index ~100-300 GB.
  • Memory needed to hold hot indexes: tens of GBs to hundreds of GBs.
  • Write overhead: each insert/update requires index update, increasing write latency by ~10-30%.
  • Network bandwidth depends on replication and sharding; expect 1-10 Gbps for large scale.
Interview Tip

When discussing database indexing scalability, start by explaining how indexes improve read speed but add write overhead. Then describe how index size and maintenance become bottlenecks as data grows. Finally, outline solutions like index optimization, partitioning, caching, and sharding. Use clear examples and relate to real-world query patterns.

Self Check Question

Your database handles 1000 QPS with indexes. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: The first step is to analyze if the current indexes and hardware can handle the increased read load. Likely, the database CPU or disk I/O will become bottlenecks. To fix this, add read replicas to distribute read queries and reduce load on the primary. Also, consider caching frequent queries to reduce database hits. This approach improves read scalability without immediately changing indexes or hardware.

Key Result
Database indexing improves read speed but adds write overhead; as data and queries grow, index size and maintenance become bottlenecks requiring optimization, partitioning, caching, and horizontal scaling.