| Scale | Users | Requests per Second | Data Volume | Key Changes |
|---|---|---|---|---|
| Small | 100 | ~50-100 | Few GBs | Monolithic or few microservices, single DB instance, simple load balancer |
| Medium | 10,000 | ~5,000 | TBs | Multiple microservices, DB read replicas, caching layers, API gateway |
| Large | 1,000,000 | ~500,000 | Petabytes | Service partitioning by region, sharded databases, distributed caches, message queues |
| Very Large | 100,000,000 | ~50,000,000 | Exabytes | Global multi-region deployment, advanced sharding, CDN for static content, autoscaling, event-driven architecture |
Uber architecture overview in Microservices - Scalability & System Analysis
At small to medium scale, the database is the first bottleneck. Uber's system needs to handle many writes and reads for rides, locations, and user data. A single database instance can only handle so many queries per second (around 5,000-10,000 QPS). As user count grows, the DB becomes slow and unresponsive, causing delays in matching riders and drivers.
- Database scaling: Use read replicas to spread read load, and shard data by geography or user ID to distribute writes.
- Microservices: Break the system into smaller services (e.g., ride matching, payments, notifications) to scale independently.
- Caching: Use Redis or Memcached to cache frequent queries like driver locations and surge pricing.
- Message queues: Use Kafka or RabbitMQ for asynchronous processing (e.g., trip events, notifications) to smooth spikes.
- Load balancing: Distribute incoming requests across multiple app servers to avoid CPU/memory bottlenecks.
- CDN: For static content like app assets and map tiles, use CDN to reduce latency and bandwidth load.
- Autoscaling: Automatically add or remove servers based on traffic to optimize cost and performance.
Assuming 1 million active users generating 500,000 requests per second:
- Database: Needs sharding and replicas to handle 500K QPS (each DB node ~10K QPS -> ~50 nodes minimum)
- Storage: Trip data and logs can reach petabytes annually; use distributed storage with tiering
- Bandwidth: 500K requests/sec x 1 KB/request ≈ 500 MB/s (~4 Gbps network capacity needed)
- Cache: Redis clusters handling hundreds of thousands ops/sec to reduce DB load
- Servers: Hundreds to thousands of app servers behind load balancers for concurrency
When discussing Uber's architecture scalability, start by outlining the main components (users, drivers, ride matching, payments). Then identify the bottleneck (usually database). Next, explain how microservices and data partitioning help scale. Mention caching and asynchronous processing to handle load spikes. Finally, discuss global deployment and autoscaling for very large scale. Keep your explanation clear and structured.
Question: Your database handles 1000 QPS. Traffic grows 10x. What do you do first?
Answer: Add read replicas to distribute read queries and reduce load on the primary database. Then consider sharding data to scale writes. Also, introduce caching to reduce database hits.