0
0
HLDsystem_design~25 mins

Why choosing the right storage matters in HLD - Design It to Understand It

Choose your learning style9 modes available
Design: Storage Selection for Scalable Systems
Focus on selecting storage types and explaining their impact on system design. Out of scope: detailed implementation of storage engines or hardware specifics.
Functional Requirements
FR1: Store and retrieve data efficiently based on use case
FR2: Support different data types: structured, semi-structured, unstructured
FR3: Ensure data durability and availability
FR4: Handle expected read and write loads
FR5: Provide appropriate latency for user experience
FR6: Support data consistency needs
FR7: Allow scaling as data and traffic grow
Non-Functional Requirements
NFR1: System must handle up to 1 million requests per second
NFR2: Latency for read operations should be under 100ms p99
NFR3: Availability target of 99.9% uptime
NFR4: Data size expected to grow to multiple terabytes
NFR5: Budget constraints limit use of very expensive storage solutions
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
❓ Question 7
Key Components
Relational databases (e.g., PostgreSQL, MySQL)
NoSQL databases (e.g., MongoDB, Cassandra)
Key-value stores (e.g., Redis, DynamoDB)
Object storage (e.g., S3, Azure Blob Storage)
Caching layers
Backup and archival storage
Design Patterns
Caching to reduce latency
Sharding and partitioning for scaling
Replication for availability and durability
Eventual consistency vs strong consistency trade-offs
Data tiering based on access frequency
CQRS (Command Query Responsibility Segregation) for read/write optimization
Reference Architecture
Client
  |
  v
Load Balancer
  |
  v
Application Servers
  |
  +-----------------------------+
  |                             |
  v                             v
Cache Layer (Redis)          Primary Storage
  |                             |
  v                             v
Fast Key-Value Store       Relational DB / NoSQL DB
  |
  v
Object Storage (for files, backups)
Components
Load Balancer
Nginx / AWS ELB
Distributes incoming requests evenly to application servers
Application Servers
Node.js / Java / Python
Handles business logic and interacts with storage layers
Cache Layer
Redis / Memcached
Stores frequently accessed data to reduce latency and load on databases
Primary Storage
PostgreSQL / MongoDB / Cassandra
Stores main application data with appropriate consistency and query capabilities
Object Storage
Amazon S3 / Azure Blob Storage
Stores large unstructured data like files, images, backups
Request Flow
1. Client sends request to Load Balancer
2. Load Balancer forwards request to Application Server
3. Application Server checks Cache Layer for data
4. If cache hit, data returned immediately
5. If cache miss, Application Server queries Primary Storage
6. Primary Storage returns data to Application Server
7. Application Server updates Cache Layer with fresh data
8. For large files or backups, Application Server interacts with Object Storage
9. Application Server sends response back to Client
Database Schema
Entities depend on use case but generally include: - User (id, name, email, created_at) - Product (id, name, description, price, created_at) - Order (id, user_id, product_id, quantity, status, created_at) Relationships: - User 1:N Order - Product 1:N Order Storage choice affects schema design, e.g., relational tables for structured data, document collections for flexible schemas.
Scaling Discussion
Bottlenecks
Database write throughput limits
Cache size and eviction policies
Load balancer capacity
Network bandwidth for large object storage
Data consistency delays in distributed storage
Solutions
Implement database sharding and partitioning to distribute load
Use cache eviction strategies and scale cache clusters horizontally
Deploy multiple load balancers with health checks and failover
Use CDN and multipart upload for large files to optimize bandwidth
Choose appropriate consistency models and use replication carefully
Interview Tips
Time: Spend 10 minutes understanding requirements and constraints, 15 minutes designing the storage architecture, 10 minutes discussing scaling and trade-offs, 10 minutes answering questions.
Explain how different storage types fit different data and access patterns
Discuss trade-offs between consistency, availability, and partition tolerance
Highlight importance of caching for performance
Show awareness of scaling challenges and mitigation strategies
Use real-world analogies like choosing the right container for different items