0
0
Microservicessystem_design~10 mins

Spotify architecture overview in Microservices - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Spotify architecture overview
Growth Table: Spotify Architecture Scaling
ScaleUsersKey Changes
Small100 usersSingle microservice instances, simple DB, minimal caching, direct client-server communication
Medium10,000 usersMultiple microservice instances, load balancers, caching layers (Redis), read replicas for DB, CDN for static content
Large1 million usersHorizontal scaling of microservices, sharded databases, distributed caches, advanced CDN usage, message queues for async tasks
Very Large100 million usersGlobal data centers, geo-distributed microservices, multi-region DB clusters with sharding, heavy use of CDNs, event-driven architecture, autoscaling
First Bottleneck

At around 10,000 to 100,000 concurrent users, the database becomes the first bottleneck. Spotify's metadata and user data queries increase, causing latency and throughput issues. The single database instance struggles with read/write loads, especially for personalized playlists and recommendations.

Scaling Solutions
  • Database Scaling: Use read replicas to offload read queries, and shard user data by region or user ID to distribute load.
  • Caching: Implement Redis or Memcached to cache frequently accessed data like playlists and song metadata.
  • Microservices: Horizontally scale microservices behind load balancers to handle increased API requests.
  • CDN: Use Content Delivery Networks to serve static content like album art and audio files closer to users, reducing latency and bandwidth usage.
  • Message Queues: Use Kafka or RabbitMQ for asynchronous processing like recommendations and analytics to smooth peak loads.
  • Global Distribution: Deploy services and databases in multiple regions to reduce latency and improve fault tolerance.
Back-of-Envelope Cost Analysis

At 1 million users, assuming 10% active concurrently, about 100,000 concurrent connections need handling.

  • API requests: ~500,000 QPS (assuming 5 requests/user/second peak)
  • Database: Needs to handle ~50,000 QPS (writes + reads), requiring sharding and replicas
  • Cache: Must support ~200,000 ops/sec for hot data
  • Bandwidth: Audio streaming at 160 kbps per user -> ~16 Gbps total bandwidth
  • Storage: Petabytes of audio files stored across distributed object storage
Interview Tip

Start by outlining Spotify's core components: user service, music catalog, streaming service, recommendation engine. Discuss scaling each component separately. Identify bottlenecks like DB and bandwidth early. Propose solutions like caching, sharding, and CDNs. Always justify why a solution fits the bottleneck. Use real numbers to show understanding.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching for frequent queries to reduce DB hits.

Key Result
Spotify's database is the first bottleneck as users grow; scaling requires sharding, caching, and distributed microservices with CDNs to handle massive concurrent streaming and metadata requests.