HLDsystem_design~10 mins

Video recommendation system in HLD - Scalability & System Analysis

Choose your learning style9 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Scalability Analysis - Video recommendation system

Growth Table: Video Recommendation System

Scale	Users	Data Volume	Traffic	System Changes
Small	100 users	Few thousand videos, user profiles	Low QPS (~100 requests/sec)	Single app server, single DB instance, simple batch recommendations
Medium	10K users	Millions of videos, user interactions	Moderate QPS (~10K requests/sec)	Multiple app servers, DB read replicas, caching layer, offline model training
Large	1M users	Hundreds of millions of videos, rich user data	High QPS (~100K requests/sec)	Distributed databases, sharded user data, real-time streaming data pipelines, CDN for video delivery
Very Large	100M users	Billions of videos and interactions	Very high QPS (~10M requests/sec)	Multi-region deployment, advanced sharding, multi-level caching, AI model serving clusters, global CDN, data archiving

First Bottleneck

At small scale, the database is the first bottleneck because it must handle many read and write requests for user interactions and video metadata. As users grow, the recommendation model training and serving become bottlenecks due to heavy computation and data volume. At large scale, network bandwidth and data storage also become critical bottlenecks.

Scaling Solutions

Database: Use read replicas to handle read traffic, connection pooling, and eventually shard user and video data by user ID or video category.
Caching: Cache popular recommendations and video metadata using Redis or Memcached to reduce DB load.
Application Servers: Horizontally scale app servers behind load balancers to handle increased request volume.
Model Training and Serving: Use distributed computing frameworks for offline training and deploy models on dedicated serving clusters with GPU acceleration.
Data Pipelines: Implement real-time streaming pipelines (e.g., Kafka) for user activity ingestion and feature updates.
Content Delivery: Use a global CDN to serve video content efficiently and reduce latency.
Storage: Use distributed object storage for videos and archive old data to cheaper storage tiers.

Back-of-Envelope Cost Analysis

Requests per second: At 1M users, expect ~100K QPS for recommendations and video views.
Storage: Videos require petabytes of storage; metadata and user data require terabytes to petabytes.
Bandwidth: Video streaming consumes the most bandwidth; a 1 Gbps link can serve ~125 MB/s, so multiple CDN edge servers are needed.
Compute: Model training requires GPU clusters; serving models requires CPU/GPU servers scaled horizontally.

Interview Tip

Start by clarifying scale and requirements. Discuss data volume, traffic patterns, and latency needs. Identify the first bottleneck and propose targeted solutions. Explain trade-offs between consistency, latency, and cost. Use real numbers to justify scaling steps. Show understanding of caching, sharding, and distributed systems.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce database load before considering sharding or more complex solutions.

Key Result

The database is the first bottleneck at low to medium scale; scaling requires caching, read replicas, and sharding, while at large scale, distributed model serving and CDN become critical.