0
0
Microservicessystem_design~10 mins

Feature flags in Microservices - Scalability & System Analysis

Choose your learning style9 modes available
Scalability Analysis - Feature flags
Growth Table: Feature Flags at Different Scales
UsersRequests per SecondFlag Evaluations per SecondStorage SizeLatency ImpactComplexity
100 users~10-50 RPS~10-50KBs (few flags)NegligibleSimple flag storage in memory or DB
10,000 users~1,000-5,000 RPS~1,000-5,000MBs (hundreds of flags)Low latency neededUse caching, distributed config store
1,000,000 users~100,000 RPS~100,000GBs (thousands of flags, segments)Must be <10ms per evalUse CDN, caching, distributed flag evaluation service
100,000,000 users~10,000,000 RPS~10,000,000TBs (complex targeting, analytics)Highly optimized, near real-timeGlobal distributed system, sharding, edge caching
First Bottleneck

The first bottleneck is the feature flag evaluation service and its data store. As user count and requests grow, the system must evaluate flags quickly for each request. The database or config store can become overwhelmed by read requests or complex targeting rules, causing latency spikes.

Scaling Solutions
  • Caching: Use in-memory caches (e.g., Redis, local caches) to store flag data and evaluation results to reduce DB load.
  • Horizontal Scaling: Add more instances of the flag evaluation service behind a load balancer to handle more concurrent requests.
  • Read Replicas: Use database read replicas to distribute read traffic for flag configurations.
  • Sharding: Partition flag data by user segments or regions to reduce data size per node.
  • CDN and Edge Caching: Cache flag data closer to users to reduce latency and network load.
  • Asynchronous Updates: Push flag changes asynchronously to services to avoid blocking requests.
  • Feature Flag Evaluation SDKs: Use client-side evaluation where possible to reduce server load.
Back-of-Envelope Cost Analysis
  • At 1M users with 100K RPS, assuming each flag evaluation is 1KB data read, total bandwidth ~100MB/s.
  • Storage for flags and targeting rules grows with complexity; expect GBs at million-user scale.
  • CPU usage depends on evaluation complexity; caching reduces CPU by avoiding repeated computations.
  • Network bandwidth and latency critical; edge caching reduces cross-region traffic.
Interview Tip

Start by explaining what feature flags are and why they matter. Then discuss how load grows with users and requests. Identify the bottleneck clearly (flag evaluation and data store). Propose concrete scaling solutions like caching, horizontal scaling, and sharding. Mention trade-offs like consistency vs latency. Use real numbers to show understanding.

Self Check Question

Your database handles 1000 QPS for flag data reads. Traffic grows 10x to 10,000 QPS. What do you do first?

Answer: Add caching layer (e.g., Redis or in-memory cache) to reduce direct DB reads and improve response time before scaling the database or services.

Key Result
Feature flag evaluation service and its data store become the first bottleneck as user requests grow; caching and horizontal scaling are key to maintain low latency and high throughput.