| Scale | Users | Key Changes |
|---|---|---|
| Small | 100 users | Single microservice instances, simple DB, minimal caching, direct client-server communication |
| Medium | 10,000 users | Multiple microservice instances, load balancers, caching layers (Redis), read replicas for DB, CDN for static content |
| Large | 1 million users | Horizontal scaling of microservices, sharded databases, distributed caches, advanced CDN usage, message queues for async tasks |
| Very Large | 100 million users | Global data centers, geo-distributed microservices, multi-region DB clusters with sharding, heavy use of CDNs, event-driven architecture, autoscaling |
Spotify architecture overview in Microservices - Scalability & System Analysis
At around 10,000 to 100,000 concurrent users, the database becomes the first bottleneck. Spotify's metadata and user data queries increase, causing latency and throughput issues. The single database instance struggles with read/write loads, especially for personalized playlists and recommendations.
- Database Scaling: Use read replicas to offload read queries, and shard user data by region or user ID to distribute load.
- Caching: Implement Redis or Memcached to cache frequently accessed data like playlists and song metadata.
- Microservices: Horizontally scale microservices behind load balancers to handle increased API requests.
- CDN: Use Content Delivery Networks to serve static content like album art and audio files closer to users, reducing latency and bandwidth usage.
- Message Queues: Use Kafka or RabbitMQ for asynchronous processing like recommendations and analytics to smooth peak loads.
- Global Distribution: Deploy services and databases in multiple regions to reduce latency and improve fault tolerance.
At 1 million users, assuming 10% active concurrently, about 100,000 concurrent connections need handling.
- API requests: ~500,000 QPS (assuming 5 requests/user/second peak)
- Database: Needs to handle ~50,000 QPS (writes + reads), requiring sharding and replicas
- Cache: Must support ~200,000 ops/sec for hot data
- Bandwidth: Audio streaming at 160 kbps per user -> ~16 Gbps total bandwidth
- Storage: Petabytes of audio files stored across distributed object storage
Start by outlining Spotify's core components: user service, music catalog, streaming service, recommendation engine. Discuss scaling each component separately. Identify bottlenecks like DB and bandwidth early. Propose solutions like caching, sharding, and CDNs. Always justify why a solution fits the bottleneck. Use real numbers to show understanding.
Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching for frequent queries to reduce DB hits.