HLDsystem_design~25 mins

Block storage vs object storage vs file storage in HLD - Design Approaches Compared

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Design: Storage System Comparison

Focus on understanding and designing block storage, object storage, and file storage systems. Out of scope: detailed hardware design or specific vendor implementations.

Functional Requirements

FR1: Support storing and retrieving data efficiently

FR2: Handle different types of data access patterns

FR3: Provide scalability for growing data needs

FR4: Ensure data durability and availability

FR5: Allow easy integration with applications

Non-Functional Requirements

NFR1: Latency for data access should be under 100ms for common operations

NFR2: System should scale to petabytes of data

NFR3: Availability target of 99.9% uptime

NFR4: Support concurrent access by thousands of clients

Think Before You Design

Questions to Ask

❓ Question 1

❓ Question 2

❓ Question 3

❓ Question 4

❓ Question 5

Key Components

Storage nodes or devices

Metadata management service

Access protocols (e.g., NFS, SMB, REST APIs)

Caching layers

Replication and backup mechanisms

Design Patterns

Distributed file system pattern

Key-value store pattern for object storage

Block device abstraction

Caching and tiering strategies

Replication and consistency models

Reference Architecture

Client
  |
  |---> File Storage System (e.g., NFS, SMB)
  |         |
  |         |---> Metadata Server (manages directories, files)
  |         |---> Storage Nodes (store file blocks)
  |
  |---> Object Storage System (e.g., S3)
  |         |
  |         |---> Object API Gateway
  |         |---> Metadata Service (stores object metadata)
  |         |---> Storage Nodes (store objects as blobs)
  |
  |---> Block Storage System (e.g., SAN)
            |
            |---> Block Device Interface
            |---> Storage Arrays (store raw blocks)
            |---> Volume Manager

Components

File Storage System

NFS, SMB, Distributed File Systems

Store and manage files with hierarchical directories and metadata

Object Storage System

REST APIs, S3-compatible systems

Store data as objects with metadata, scalable for unstructured data

Block Storage System

SAN, iSCSI, Fibre Channel

Provide raw block-level storage for low-level data access

Metadata Server

Custom service or database

Manage file or object metadata for quick lookup and organization

Storage Nodes

Disk arrays, SSDs

Physically store data blocks, files, or objects

Access Protocols

NFS, SMB for file; REST for object; iSCSI for block

Enable clients to communicate with storage systems

Request Flow

1. Client sends request to access data.

2. For file storage, client uses file system protocol to request file or directory.

3. Metadata server locates file blocks and returns storage node info.

4. Client reads/writes file blocks from storage nodes.

5. For object storage, client sends REST API request with object key.

6. Object API gateway authenticates and forwards request to metadata service.

7. Metadata service locates object storage node and retrieves or stores object blob.

8. For block storage, client connects to block device interface.

9. Block device maps requests to physical storage arrays.

10. Client reads/writes raw blocks without metadata abstraction.

Database Schema

Entities: - File: id, name, path, size, timestamps, metadata_id - Directory: id, name, parent_directory_id - Object: id, key, size, metadata_id, storage_node_id - Metadata: id, attributes (key-value pairs) - StorageNode: id, type (block/file/object), capacity, status Relationships: - Directory contains Files and other Directories (1:N) - File and Object link to Metadata (1:1) - StorageNode stores Files, Objects, or Blocks (1:N)

Scaling Discussion

Bottlenecks

Metadata server becomes a single point of failure or bottleneck in file and object storage.

Storage nodes can run out of capacity or bandwidth under heavy load.

Network bandwidth limits data transfer speed between clients and storage.

Consistency and synchronization overhead increases with scale.

Solutions

Use distributed metadata services with sharding and replication to avoid bottlenecks.

Add more storage nodes and use load balancing to distribute data and requests.

Implement caching layers near clients to reduce repeated data transfers.

Adopt eventual consistency models where strict consistency is not required to improve performance.

Interview Tips

Time: Spend 10 minutes understanding requirements and clarifying use cases, 20 minutes designing and explaining each storage type, 10 minutes discussing scaling and trade-offs, 5 minutes for questions.

Explain differences in data access patterns and use cases for block, file, and object storage.

Describe how metadata management differs and why it matters.

Discuss protocols and client interaction models for each storage type.

Highlight scalability challenges and solutions for large-scale storage systems.

Show understanding of trade-offs between performance, complexity, and flexibility.