0
0
Microservicessystem_design~25 mins

Feature flags in Microservices - System Design Exercise

Choose your learning style9 modes available
Design: Feature Flag Management System
Design the feature flag service, API, and integration approach with microservices. Out of scope: detailed UI design and specific microservice implementations.
Functional Requirements
FR1: Allow enabling or disabling features dynamically without redeploying services
FR2: Support targeting feature flags to specific user groups or environments
FR3: Provide a dashboard for product managers to control flags
FR4: Ensure low latency flag evaluation in microservices
FR5: Support gradual rollouts (percentage-based) of features
FR6: Audit changes to feature flags for compliance
FR7: Integrate with multiple microservices in different languages
Non-Functional Requirements
NFR1: Handle 100,000 concurrent users evaluating flags
NFR2: API response latency for flag evaluation under 10ms (p99)
NFR3: 99.9% uptime for the feature flag service
NFR4: Support eventual consistency for flag updates within 1 minute
NFR5: Secure access to flag management dashboard
Think Before You Design
Questions to Ask
❓ Question 1
❓ Question 2
❓ Question 3
❓ Question 4
❓ Question 5
❓ Question 6
Key Components
Feature flag storage database
Flag evaluation service or SDKs in microservices
Management dashboard backend and frontend
API gateway or proxy for flag evaluation requests
Audit logging system
Cache layer for fast flag retrieval
Design Patterns
Cache-aside pattern for flag caching
Publish-subscribe for flag update notifications
Circuit breaker for fallback if flag service is down
Role-based access control for dashboard
Canary releases and gradual rollout patterns
Reference Architecture
  +-------------------+       +-------------------+       +-------------------+
  |                   |       |                   |       |                   |
  |  Management       |       |  Feature Flag     |       |  Microservices    |
  |  Dashboard        | <---> |  Service API      | <---> |  (with SDKs)      |
  |  (Web UI + API)   |       |                   |       |                   |
  +-------------------+       +-------------------+       +-------------------+
           |                            |                            |
           |                            |                            |
           |                            v                            |
           |                   +-------------------+               |
           |                   |  Flag Storage DB   |               |
           |                   +-------------------+               |
           |                            |                            |
           |                            v                            |
           |                   +-------------------+               |
           |                   |  Cache Layer      | <-------------+
           |                   +-------------------+               |
           |                            |                            |
           |                            v                            |
           |                   +-------------------+               |
           |                   |  Audit Logging     |               |
           |                   +-------------------+               |
           +--------------------------------------------------------+
Components
Management Dashboard
React + Node.js backend
Allows product managers to create, update, and target feature flags securely
Feature Flag Service API
RESTful API with Node.js or Go
Handles flag CRUD operations and serves flag data to microservices
Flag Storage Database
PostgreSQL
Stores feature flag definitions, targeting rules, and metadata
Cache Layer
Redis
Caches flag data for low latency retrieval by microservices
Audit Logging System
Elasticsearch + Kibana
Records all changes to feature flags for compliance and troubleshooting
Microservice SDKs
Language-specific SDKs (e.g., Java, Python, Node.js)
Embedded in microservices to evaluate flags locally with cache and fallback
Request Flow
1. 1. Product manager logs into the Management Dashboard and creates or updates a feature flag with targeting rules.
2. 2. Dashboard backend validates and sends the flag data to the Feature Flag Service API.
3. 3. Feature Flag Service stores the flag data in the PostgreSQL database and updates the Redis cache.
4. 4. Audit Logging System records the change event.
5. 5. Microservices periodically fetch updated flags from the Feature Flag Service API or subscribe to update notifications.
6. 6. Microservice SDK caches flags locally and evaluates them on incoming user requests with targeting rules.
7. 7. If the cache is stale or missing, SDK fetches fresh flag data from the cache layer or API.
8. 8. Microservice uses the evaluation result to enable or disable features dynamically.
Database Schema
Entities: - FeatureFlag(id PK, name, description, type [boolean, multivariate], created_at, updated_at) - TargetingRule(id PK, feature_flag_id FK, attribute, operator, value, rollout_percentage) - AuditLog(id PK, feature_flag_id FK, user_id, action, timestamp, old_value, new_value) Relationships: - One FeatureFlag has many TargetingRules - AuditLog references FeatureFlag and user who made changes
Scaling Discussion
Bottlenecks
High read load on Feature Flag Service API causing latency spikes
Cache invalidation delays leading to stale flag evaluations
Audit logging volume growing rapidly with frequent flag changes
SDKs fetching flags too frequently increasing network overhead
Solutions
Implement aggressive caching with TTL and cache-aside pattern to reduce API calls
Use publish-subscribe messaging (e.g., Kafka) to push flag updates to SDKs for near real-time cache invalidation
Archive old audit logs and use scalable storage like Elasticsearch clusters
Add local SDK caching with exponential backoff and fallback to default flag values if service is unreachable
Interview Tips
Time: Spend 10 minutes understanding requirements and clarifying scope, 20 minutes designing components and data flow, 10 minutes discussing scaling and trade-offs, 5 minutes summarizing.
Explain why dynamic feature toggling is important for continuous delivery
Discuss trade-offs between consistency and latency in flag evaluation
Highlight caching strategies to meet low latency requirements
Mention security and auditability for compliance
Describe how gradual rollouts reduce risk
Show awareness of SDK design for multiple languages and fallback mechanisms