HLDsystem_design~25 mins

Database indexing in HLD - System Design Exercise

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Database Indexing System

Design focuses on indexing mechanisms within a database system. Query parsing, transaction management, and full database engine design are out of scope.

Functional Requirements

FR1: Support fast lookup of records by indexed columns

FR2: Allow multiple types of indexes (e.g., B-tree, hash)

FR3: Support efficient insert, update, and delete operations with index maintenance

FR4: Enable queries to use indexes to reduce search time

FR5: Provide ability to create and drop indexes dynamically

FR6: Handle large datasets with millions of records

FR7: Support composite (multi-column) indexes

Non-Functional Requirements

NFR1: Index lookup latency should be under 10ms for 1 million records

NFR2: Index maintenance should not degrade write throughput by more than 20%

NFR3: System availability target is 99.9% uptime

NFR4: Indexes should not consume more than 30% additional storage compared to raw data

NFR5: Support concurrent read and write operations without blocking

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

❓ Question 6

❓ Question 7

Key Components

Index manager module

Data storage engine

Query optimizer to use indexes

Concurrency control for index updates

Index data structures (B-tree, hash, bitmap)

Background index maintenance tasks

Design Patterns

B-tree indexing for range queries

Hash indexing for equality lookups

Composite index design

Write-ahead logging for index durability

Lazy index updates or batch rebuilding

Caching frequently accessed index nodes

Reference Architecture

  +-------------------+       +-------------------+       +-------------------+
  |   Query Processor | <---> |   Index Manager   | <---> |   Storage Engine  |
  +-------------------+       +-------------------+       +-------------------+
           |                           |                           |
           |                           |                           |
           v                           v                           v
  +-------------------+       +-------------------+       +-------------------+
  |  Query Optimizer  |       |  Index Data Store |       |  Data File Store  |
  +-------------------+       +-------------------+       +-------------------+

Components

Query Processor

Database engine component

Receives queries and coordinates execution using indexes

Index Manager

Module within database engine

Creates, maintains, and queries indexes

Storage Engine

Database storage layer

Stores raw data and index data structures on disk

Query Optimizer

Database engine component

Decides when and which indexes to use for queries

Index Data Store

B-tree and hash structures on disk

Stores index nodes and metadata

Data File Store

Disk files or SSD storage

Stores actual table data records

Request Flow

1. Client sends query to Query Processor

2. Query Processor passes query to Query Optimizer

3. Query Optimizer checks available indexes in Index Manager

4. If suitable index exists, Query Optimizer plans index scan

5. Query Processor requests Index Manager to perform index lookup

6. Index Manager traverses index data structure (e.g., B-tree) in Index Data Store

7. Index Manager returns matching record pointers to Query Processor

8. Query Processor fetches records from Storage Engine's Data File Store

9. Results are returned to client

10. For write operations, Index Manager updates indexes concurrently with data writes

Database Schema

Entities: - Table: stores data records - Index: metadata about each index (name, columns, type) - IndexNode: nodes of B-tree or hash buckets storing keys and pointers Relationships: - Table has many Indexes - Index consists of many IndexNodes - IndexNodes link to Table records via pointers

Scaling Discussion

Bottlenecks

Index maintenance slows down write-heavy workloads

Index size grows large, increasing disk and memory usage

Concurrent index updates cause locking and contention

Query optimizer may choose suboptimal indexes as data grows

Index rebuilds take long time on large datasets

Solutions

Use asynchronous or lazy index updates to reduce write latency

Implement index compression and partial indexes to save space

Use fine-grained locking or lock-free data structures for concurrency

Collect statistics regularly to help optimizer choose best indexes

Support online index rebuilds and incremental index maintenance

Interview Tips

Time: Spend 10 minutes understanding indexing needs and constraints, 15 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 10 minutes for Q&A.

Explain why indexes improve read performance

Discuss trade-offs between read speed and write overhead

Describe common index data structures and their use cases

Show understanding of concurrency and consistency in index updates

Mention how query optimizer leverages indexes

Address scaling challenges and practical solutions