Bird
Raised Fist0
HLDsystem_design~25 mins

Why video streaming handles massive data in HLD - Design It to Understand It

Choose your learning style9 modes available
Design: Video Streaming Data Handling
Focus on data handling aspects of video streaming including storage, delivery, and scaling. Exclude detailed UI design and content creation.
Functional Requirements
FR1: Support streaming of video content to millions of users simultaneously
FR2: Deliver video with minimal buffering and latency
FR3: Handle different video qualities and formats
FR4: Allow users to pause, rewind, and fast-forward videos
FR5: Support live streaming and on-demand videos
Non-Functional Requirements
NFR1: Scale to handle millions of concurrent viewers
NFR2: Maintain p99 latency under 300ms for video start time
NFR3: Ensure 99.9% availability for streaming service
NFR4: Efficiently store and serve large video files
NFR5: Optimize bandwidth usage to reduce costs
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
Key Components
Video storage system (object storage or distributed file system)
Content Delivery Network (CDN)
Video encoding and transcoding services
Load balancers and streaming servers
Caching layers and edge servers
Design Patterns
Content Delivery Network (CDN) for caching and distribution
Adaptive Bitrate Streaming to adjust video quality dynamically
Sharding and partitioning of video data
Asynchronous processing for encoding and uploading
Load balancing for streaming servers
Reference Architecture
User Devices
   |
   v
Load Balancer
   |
Streaming Servers <--> Video Encoding Service
   |
CDN Edge Servers
   |
Distributed Video Storage
Components
Load Balancer
Nginx or AWS ELB
Distributes user requests evenly to streaming servers
Streaming Servers
Custom streaming server or Wowza
Handles video streaming sessions and user controls
Video Encoding Service
FFmpeg or cloud encoding services
Transcodes videos into multiple formats and bitrates
Content Delivery Network (CDN)
Cloudflare, Akamai, AWS CloudFront
Caches video content close to users to reduce latency
Distributed Video Storage
Amazon S3, Google Cloud Storage, or HDFS
Stores original and transcoded video files reliably
Request Flow
1. User requests video from device.
2. Request hits Load Balancer which routes to Streaming Server.
3. Streaming Server checks CDN cache for requested video segment.
4. If cache miss, Streaming Server fetches video segment from Distributed Video Storage.
5. Video segment delivered to CDN edge server and then to user device.
6. Video Encoding Service processes uploaded videos into multiple qualities asynchronously.
7. CDN caches popular video segments to serve future requests quickly.
Database Schema
Entities: - Video: id, title, description, upload_date, duration - VideoSegment: id, video_id (FK), quality, format, storage_path - User: id, username, subscription_type - StreamingSession: id, user_id (FK), video_id (FK), start_time, end_time Relationships: - One Video has many VideoSegments - One User can have many StreamingSessions - StreamingSessions link Users to Videos watched
Scaling Discussion
Bottlenecks
Storage capacity and throughput for large video files
Network bandwidth for streaming to many users
Encoding service processing time for new videos
Load on streaming servers during peak usage
Cache misses causing higher latency
Solutions
Use scalable object storage with automatic replication and partitioning
Deploy CDN with global edge locations to reduce bandwidth load on origin
Use distributed and parallel encoding pipelines
Auto-scale streaming servers based on traffic
Pre-warm CDN caches and use predictive caching algorithms
Interview Tips
Time: 10 minutes to clarify requirements and constraints, 15 minutes to design architecture and data flow, 10 minutes to discuss scaling and bottlenecks, 10 minutes for Q&A
Explain why video data is massive due to size and concurrent users
Discuss importance of CDN to reduce latency and bandwidth
Highlight adaptive bitrate streaming for user experience
Mention asynchronous encoding to handle large uploads
Describe scaling strategies for storage, network, and compute