Bird
Raised Fist0
Microservicessystem_design~10 mins

Spotify architecture overview in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Spotify architecture overview
Growth Table: Spotify Architecture Scaling
ScaleUsersKey Changes
Small100 usersSingle microservice instances, simple DB, minimal caching, direct client-server communication
Medium10,000 usersMultiple microservice instances, load balancers, caching layers (Redis), read replicas for DB, CDN for static content
Large1 million usersHorizontal scaling of microservices, sharded databases, distributed caches, advanced CDN usage, message queues for async tasks
Very Large100 million usersGlobal data centers, geo-distributed microservices, multi-region DB clusters with sharding, heavy use of CDNs, event-driven architecture, autoscaling
First Bottleneck

At around 10,000 to 100,000 concurrent users, the database becomes the first bottleneck. Spotify's metadata and user data queries increase, causing latency and throughput issues. The single database instance struggles with read/write loads, especially for personalized playlists and recommendations.

Scaling Solutions
  • Database Scaling: Use read replicas to offload read queries, and shard user data by region or user ID to distribute load.
  • Caching: Implement Redis or Memcached to cache frequently accessed data like playlists and song metadata.
  • Microservices: Horizontally scale microservices behind load balancers to handle increased API requests.
  • CDN: Use Content Delivery Networks to serve static content like album art and audio files closer to users, reducing latency and bandwidth usage.
  • Message Queues: Use Kafka or RabbitMQ for asynchronous processing like recommendations and analytics to smooth peak loads.
  • Global Distribution: Deploy services and databases in multiple regions to reduce latency and improve fault tolerance.
Back-of-Envelope Cost Analysis

At 1 million users, assuming 10% active concurrently, about 100,000 concurrent connections need handling.

  • API requests: ~500,000 QPS (assuming 5 requests/user/second peak)
  • Database: Needs to handle ~50,000 QPS (writes + reads), requiring sharding and replicas
  • Cache: Must support ~200,000 ops/sec for hot data
  • Bandwidth: Audio streaming at 160 kbps per user -> ~16 Gbps total bandwidth
  • Storage: Petabytes of audio files stored across distributed object storage
Interview Tip

Start by outlining Spotify's core components: user service, music catalog, streaming service, recommendation engine. Discuss scaling each component separately. Identify bottlenecks like DB and bandwidth early. Propose solutions like caching, sharding, and CDNs. Always justify why a solution fits the bottleneck. Use real numbers to show understanding.

Self Check

Your database handles 1000 QPS. Traffic grows 10x. What do you do first?

Answer: Add read replicas to distribute read queries and reduce load on the primary database. Also, implement caching for frequent queries to reduce DB hits.

Key Result
Spotify's database is the first bottleneck as users grow; scaling requires sharding, caching, and distributed microservices with CDNs to handle massive concurrent streaming and metadata requests.

Practice

(1/5)
1. What is the main reason Spotify uses microservices in its architecture?
easy
A. To avoid using APIs between components
B. To separate different tasks for better scalability and maintenance
C. To make the app use less memory on devices
D. To reduce the number of servers needed

Solution

  1. Step 1: Understand microservices purpose

    Microservices split an app into small parts, each handling a specific task.
  2. Step 2: Connect to Spotify's needs

    Spotify uses this to make the app scalable and easier to maintain by isolating tasks.
  3. Final Answer:

    To separate different tasks for better scalability and maintenance -> Option B
  4. Quick Check:

    Microservices = Separate tasks for scalability [OK]
Hint: Microservices split tasks for easier scaling and updates [OK]
Common Mistakes:
  • Thinking microservices reduce memory usage directly
  • Believing microservices avoid APIs
  • Assuming microservices reduce server count
2. Which communication method is commonly used between Spotify's microservices?
easy
A. APIs and message queues
B. FTP file transfers
C. Shared memory
D. Direct database access

Solution

  1. Step 1: Identify common microservice communication

    Microservices usually communicate via APIs or message queues for loose coupling.
  2. Step 2: Match with Spotify's design

    Spotify uses APIs and message queues to keep services independent and responsive.
  3. Final Answer:

    APIs and message queues -> Option A
  4. Quick Check:

    Microservices communicate via APIs/message queues [OK]
Hint: Microservices talk via APIs or message queues, not direct DB [OK]
Common Mistakes:
  • Choosing direct database access which breaks service independence
  • Selecting shared memory which is uncommon in distributed systems
  • Picking FTP which is unrelated to microservice communication
3. Consider a microservice that handles user playlists. If it receives a request to add a song, what is the likely flow in Spotify's architecture?
medium
A. The playlist service waits for the user to refresh the app manually
B. The playlist service directly modifies the recommendation service's database
C. The playlist service sends the request to the user interface to update
D. The playlist service updates its database and sends a message to the recommendation service

Solution

  1. Step 1: Understand service responsibilities

    The playlist service manages playlists and updates its own data store.
  2. Step 2: Recognize inter-service communication

    After updating, it informs other services like recommendations via messages.
  3. Final Answer:

    The playlist service updates its database and sends a message to the recommendation service -> Option D
  4. Quick Check:

    Playlist service updates DB + notifies others [OK]
Hint: Services update own data, notify others via messages [OK]
Common Mistakes:
  • Assuming direct DB access across services
  • Thinking UI triggers backend updates
  • Believing manual refresh is needed for updates
4. A developer notices that Spotify's microservices sometimes fail to update user data consistently. What is a likely cause in the architecture?
medium
A. APIs are synchronous, causing delays
B. Services are directly sharing the same database without coordination
C. Message queues are not used, causing lost updates
D. Microservices are deployed on the same server

Solution

  1. Step 1: Identify cause of inconsistent updates

    Without message queues, updates may be lost or not delivered reliably.
  2. Step 2: Understand Spotify's architecture best practices

    Spotify uses message queues to ensure reliable communication and consistency.
  3. Final Answer:

    Message queues are not used, causing lost updates -> Option C
  4. Quick Check:

    Missing message queues = lost updates [OK]
Hint: Lost updates often mean missing message queues [OK]
Common Mistakes:
  • Blaming shared database without evidence
  • Confusing synchronous APIs with update loss
  • Assuming deployment location causes data inconsistency
5. Spotify wants to add a new feature that recommends songs based on live user activity. Which architectural change fits best with their microservices approach?
hard
A. Create a new recommendation microservice that consumes live activity events via message queues
B. Add the recommendation logic directly inside the user interface code
C. Store all live activity data in a single monolithic database accessed by all services
D. Use FTP to transfer live activity logs to the recommendation service hourly

Solution

  1. Step 1: Identify best practice for new feature in microservices

    Adding a new microservice keeps responsibilities separate and scalable.
  2. Step 2: Use message queues for live data

    Consuming live events via message queues fits asynchronous, decoupled design.
  3. Final Answer:

    Create a new recommendation microservice that consumes live activity events via message queues -> Option A
  4. Quick Check:

    New microservice + message queues = best fit [OK]
Hint: New features get own microservice, use message queues for live data [OK]
Common Mistakes:
  • Embedding logic in UI breaks separation
  • Using monolithic DB reduces scalability
  • FTP is outdated and slow for live data