0
0
HLDsystem_design~25 mins

Database indexing in HLD - System Design Exercise

Choose your learning style9 modes available
Design: Database Indexing System
Design focuses on indexing mechanisms within a database system. Query parsing, transaction management, and full database engine design are out of scope.
Functional Requirements
FR1: Support fast lookup of records by indexed columns
FR2: Allow multiple types of indexes (e.g., B-tree, hash)
FR3: Support efficient insert, update, and delete operations with index maintenance
FR4: Enable queries to use indexes to reduce search time
FR5: Provide ability to create and drop indexes dynamically
FR6: Handle large datasets with millions of records
FR7: Support composite (multi-column) indexes
Non-Functional Requirements
NFR1: Index lookup latency should be under 10ms for 1 million records
NFR2: Index maintenance should not degrade write throughput by more than 20%
NFR3: System availability target is 99.9% uptime
NFR4: Indexes should not consume more than 30% additional storage compared to raw data
NFR5: Support concurrent read and write operations without blocking
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
❓ Question 7
Key Components
Index manager module
Data storage engine
Query optimizer to use indexes
Concurrency control for index updates
Index data structures (B-tree, hash, bitmap)
Background index maintenance tasks
Design Patterns
B-tree indexing for range queries
Hash indexing for equality lookups
Composite index design
Write-ahead logging for index durability
Lazy index updates or batch rebuilding
Caching frequently accessed index nodes
Reference Architecture
  +-------------------+       +-------------------+       +-------------------+
  |   Query Processor | <---> |   Index Manager   | <---> |   Storage Engine  |
  +-------------------+       +-------------------+       +-------------------+
           |                           |                           |
           |                           |                           |
           v                           v                           v
  +-------------------+       +-------------------+       +-------------------+
  |  Query Optimizer  |       |  Index Data Store |       |  Data File Store  |
  +-------------------+       +-------------------+       +-------------------+
Components
Query Processor
Database engine component
Receives queries and coordinates execution using indexes
Index Manager
Module within database engine
Creates, maintains, and queries indexes
Storage Engine
Database storage layer
Stores raw data and index data structures on disk
Query Optimizer
Database engine component
Decides when and which indexes to use for queries
Index Data Store
B-tree and hash structures on disk
Stores index nodes and metadata
Data File Store
Disk files or SSD storage
Stores actual table data records
Request Flow
1. Client sends query to Query Processor
2. Query Processor passes query to Query Optimizer
3. Query Optimizer checks available indexes in Index Manager
4. If suitable index exists, Query Optimizer plans index scan
5. Query Processor requests Index Manager to perform index lookup
6. Index Manager traverses index data structure (e.g., B-tree) in Index Data Store
7. Index Manager returns matching record pointers to Query Processor
8. Query Processor fetches records from Storage Engine's Data File Store
9. Results are returned to client
10. For write operations, Index Manager updates indexes concurrently with data writes
Database Schema
Entities: - Table: stores data records - Index: metadata about each index (name, columns, type) - IndexNode: nodes of B-tree or hash buckets storing keys and pointers Relationships: - Table has many Indexes - Index consists of many IndexNodes - IndexNodes link to Table records via pointers
Scaling Discussion
Bottlenecks
Index maintenance slows down write-heavy workloads
Index size grows large, increasing disk and memory usage
Concurrent index updates cause locking and contention
Query optimizer may choose suboptimal indexes as data grows
Index rebuilds take long time on large datasets
Solutions
Use asynchronous or lazy index updates to reduce write latency
Implement index compression and partial indexes to save space
Use fine-grained locking or lock-free data structures for concurrency
Collect statistics regularly to help optimizer choose best indexes
Support online index rebuilds and incremental index maintenance
Interview Tips
Time: Spend 10 minutes understanding indexing needs and constraints, 15 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 10 minutes for Q&A.
Explain why indexes improve read performance
Discuss trade-offs between read speed and write overhead
Describe common index data structures and their use cases
Show understanding of concurrency and consistency in index updates
Mention how query optimizer leverages indexes
Address scaling challenges and practical solutions