HLDsystem_design~10 mins

Database indexing in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Database indexing

Growth Table: Database Indexing at Different Scales

Users / Queries	100 Users	10,000 Users	1 Million Users	100 Million Users
Query Volume (QPS)	~100 QPS	~1,000 QPS	~10,000 QPS	~100,000 QPS
Data Size	MBs to low GBs	GBs to tens of GBs	Hundreds of GBs to TBs	Multiple TBs to PBs
Index Size	Small, fits in memory	Larger, partial in memory	Large, needs optimized storage	Very large, requires partitioning
Query Latency	Low (ms)	Low to moderate	Moderate to high if unoptimized	High without advanced indexing
Index Maintenance Cost	Low	Moderate	High, impacts write speed	Very high, requires tuning

First Bottleneck

As user count and query volume grow, the first bottleneck is usually the database query performance. Without proper indexing, queries become slow because the database scans large tables. Indexes speed up reads but add overhead on writes. At medium scale, index maintenance and memory usage become bottlenecks. At very large scale, index size and disk I/O limit performance.

Scaling Solutions for Database Indexing

Index Optimization: Use appropriate index types (B-tree, hash, composite) based on query patterns.
Partial and Covering Indexes: Index only needed columns to reduce size and improve speed.
Index Maintenance: Schedule index rebuilds and updates during low traffic to reduce write impact.
Horizontal Scaling: Shard data to distribute indexes across multiple database instances.
Caching: Use in-memory caches (Redis, Memcached) to reduce database hits for frequent queries.
Read Replicas: Offload read queries to replicas with their own indexes to reduce load on primary.
Partitioning: Split large tables and indexes by range or hash to improve query and maintenance performance.

Back-of-Envelope Cost Analysis

Assuming 10,000 QPS at medium scale:

Each query reads indexed data in ~1-5 ms.
Index size can be 10-30% of data size; for 1 TB data, index ~100-300 GB.
Memory needed to hold hot indexes: tens of GBs to hundreds of GBs.
Write overhead: each insert/update requires index update, increasing write latency by ~10-30%.
Network bandwidth depends on replication and sharding; expect 1-10 Gbps for large scale.

Interview Tip

When discussing database indexing scalability, start by explaining how indexes improve read speed but add write overhead. Then describe how index size and maintenance become bottlenecks as data grows. Finally, outline solutions like index optimization, partitioning, caching, and sharding. Use clear examples and relate to real-world query patterns.

Self Check Question

Your database handles 1000 QPS with indexes. Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: The first step is to analyze if the current indexes and hardware can handle the increased read load. Likely, the database CPU or disk I/O will become bottlenecks. To fix this, add read replicas to distribute read queries and reduce load on the primary. Also, consider caching frequent queries to reduce database hits. This approach improves read scalability without immediately changing indexes or hardware.

Key Result

Database indexing improves read speed but adds write overhead; as data and queries grow, index size and maintenance become bottlenecks requiring optimization, partitioning, caching, and horizontal scaling.