0
0
HLDsystem_design~25 mins

Block storage vs object storage vs file storage in HLD - Design Approaches Compared

Choose your learning style9 modes available
Design: Storage System Comparison
Focus on understanding and designing block storage, object storage, and file storage systems. Out of scope: detailed hardware design or specific vendor implementations.
Functional Requirements
FR1: Support storing and retrieving data efficiently
FR2: Handle different types of data access patterns
FR3: Provide scalability for growing data needs
FR4: Ensure data durability and availability
FR5: Allow easy integration with applications
Non-Functional Requirements
NFR1: Latency for data access should be under 100ms for common operations
NFR2: System should scale to petabytes of data
NFR3: Availability target of 99.9% uptime
NFR4: Support concurrent access by thousands of clients
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Storage nodes or devices
Metadata management service
Access protocols (e.g., NFS, SMB, REST APIs)
Caching layers
Replication and backup mechanisms
Design Patterns
Distributed file system pattern
Key-value store pattern for object storage
Block device abstraction
Caching and tiering strategies
Replication and consistency models
Reference Architecture
Client
  |
  |---> File Storage System (e.g., NFS, SMB)
  |         |
  |         |---> Metadata Server (manages directories, files)
  |         |---> Storage Nodes (store file blocks)
  |
  |---> Object Storage System (e.g., S3)
  |         |
  |         |---> Object API Gateway
  |         |---> Metadata Service (stores object metadata)
  |         |---> Storage Nodes (store objects as blobs)
  |
  |---> Block Storage System (e.g., SAN)
            |
            |---> Block Device Interface
            |---> Storage Arrays (store raw blocks)
            |---> Volume Manager
Components
File Storage System
NFS, SMB, Distributed File Systems
Store and manage files with hierarchical directories and metadata
Object Storage System
REST APIs, S3-compatible systems
Store data as objects with metadata, scalable for unstructured data
Block Storage System
SAN, iSCSI, Fibre Channel
Provide raw block-level storage for low-level data access
Metadata Server
Custom service or database
Manage file or object metadata for quick lookup and organization
Storage Nodes
Disk arrays, SSDs
Physically store data blocks, files, or objects
Access Protocols
NFS, SMB for file; REST for object; iSCSI for block
Enable clients to communicate with storage systems
Request Flow
1. Client sends request to access data.
2. For file storage, client uses file system protocol to request file or directory.
3. Metadata server locates file blocks and returns storage node info.
4. Client reads/writes file blocks from storage nodes.
5. For object storage, client sends REST API request with object key.
6. Object API gateway authenticates and forwards request to metadata service.
7. Metadata service locates object storage node and retrieves or stores object blob.
8. For block storage, client connects to block device interface.
9. Block device maps requests to physical storage arrays.
10. Client reads/writes raw blocks without metadata abstraction.
Database Schema
Entities: - File: id, name, path, size, timestamps, metadata_id - Directory: id, name, parent_directory_id - Object: id, key, size, metadata_id, storage_node_id - Metadata: id, attributes (key-value pairs) - StorageNode: id, type (block/file/object), capacity, status Relationships: - Directory contains Files and other Directories (1:N) - File and Object link to Metadata (1:1) - StorageNode stores Files, Objects, or Blocks (1:N)
Scaling Discussion
Bottlenecks
Metadata server becomes a single point of failure or bottleneck in file and object storage.
Storage nodes can run out of capacity or bandwidth under heavy load.
Network bandwidth limits data transfer speed between clients and storage.
Consistency and synchronization overhead increases with scale.
Solutions
Use distributed metadata services with sharding and replication to avoid bottlenecks.
Add more storage nodes and use load balancing to distribute data and requests.
Implement caching layers near clients to reduce repeated data transfers.
Adopt eventual consistency models where strict consistency is not required to improve performance.
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying use cases, 20 minutes designing and explaining each storage type, 10 minutes discussing scaling and trade-offs, 5 minutes for questions.
Explain differences in data access patterns and use cases for block, file, and object storage.
Describe how metadata management differs and why it matters.
Discuss protocols and client interaction models for each storage type.
Highlight scalability challenges and solutions for large-scale storage systems.
Show understanding of trade-offs between performance, complexity, and flexibility.