Bird
Raised Fist0
HLDsystem_design~10 mins

Video recommendation system in HLD - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Video recommendation system
Growth Table: Video Recommendation System
ScaleUsersData VolumeTrafficSystem Changes
Small100 usersFew thousand videos, user profilesLow QPS (~100 requests/sec)Single app server, single DB instance, simple batch recommendations
Medium10K usersMillions of videos, user interactionsModerate QPS (~10K requests/sec)Multiple app servers, DB read replicas, caching layer, offline model training
Large1M usersHundreds of millions of videos, rich user dataHigh QPS (~100K requests/sec)Distributed databases, sharded user data, real-time streaming data pipelines, CDN for video delivery
Very Large100M usersBillions of videos and interactionsVery high QPS (~10M requests/sec)Multi-region deployment, advanced sharding, multi-level caching, AI model serving clusters, global CDN, data archiving
First Bottleneck

At small scale, the database is the first bottleneck because it must handle many read and write requests for user interactions and video metadata. As users grow, the recommendation model training and serving become bottlenecks due to heavy computation and data volume. At large scale, network bandwidth and data storage also become critical bottlenecks.

Scaling Solutions
  • Database: Use read replicas to handle read traffic, connection pooling, and eventually shard user and video data by user ID or video category.
  • Caching: Cache popular recommendations and video metadata using Redis or Memcached to reduce DB load.
  • Application Servers: Horizontally scale app servers behind load balancers to handle increased request volume.
  • Model Training and Serving: Use distributed computing frameworks for offline training and deploy models on dedicated serving clusters with GPU acceleration.
  • Data Pipelines: Implement real-time streaming pipelines (e.g., Kafka) for user activity ingestion and feature updates.
  • Content Delivery: Use a global CDN to serve video content efficiently and reduce latency.
  • Storage: Use distributed object storage for videos and archive old data to cheaper storage tiers.
Back-of-Envelope Cost Analysis
  • Requests per second: At 1M users, expect ~100K QPS for recommendations and video views.
  • Storage: Videos require petabytes of storage; metadata and user data require terabytes to petabytes.
  • Bandwidth: Video streaming consumes the most bandwidth; a 1 Gbps link can serve ~125 MB/s, so multiple CDN edge servers are needed.
  • Compute: Model training requires GPU clusters; serving models requires CPU/GPU servers scaled horizontally.
Interview Tip

Start by clarifying scale and requirements. Discuss data volume, traffic patterns, and latency needs. Identify the first bottleneck and propose targeted solutions. Explain trade-offs between consistency, latency, and cost. Use real numbers to justify scaling steps. Show understanding of caching, sharding, and distributed systems.

Self Check

Your database handles 1000 QPS. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add read replicas and implement caching to reduce database load before considering sharding or more complex solutions.

Key Result
The database is the first bottleneck at low to medium scale; scaling requires caching, read replicas, and sharding, while at large scale, distributed model serving and CDN become critical.