Bird
Raised Fist0
Microservicessystem_design~10 mins

Request aggregation in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Scalability Analysis - Request aggregation
Growth Table: Request Aggregation at Different Scales
UsersRequest VolumeAggregation LoadLatency ImpactInfrastructure Changes
100 users~200 requests/secSingle aggregator instance handles requestsLow latency, simple aggregationBasic load balancer, 1 aggregator service
10,000 users~20,000 requests/secAggregator CPU/memory starts to strainLatency may increase due to queuingHorizontal scaling of aggregator, caching introduced
1,000,000 users~2,000,000 requests/secAggregator becomes bottleneck; network saturation possibleHigher latency, possible timeoutsSharded aggregation, distributed caches, async processing
100,000,000 users~200,000,000 requests/secMultiple aggregator clusters needed; data partitioning essentialLatency sensitive; requires advanced load balancingGlobal load balancing, CDN for static data, event-driven aggregation
First Bottleneck

The aggregator service CPU and memory become the first bottleneck as it must combine multiple microservice responses per user request. At around 10,000 users, a single aggregator struggles to handle the volume due to CPU saturation and increased latency from queuing.

Scaling Solutions
  • Horizontal Scaling: Add more aggregator instances behind a load balancer to distribute request load.
  • Caching: Cache aggregated responses for repeated queries to reduce load on microservices and aggregator.
  • Sharding: Partition aggregation by user segments or request types to parallelize processing.
  • Asynchronous Processing: Use event-driven or message queues to decouple aggregation and reduce latency spikes.
  • CDN: For static or semi-static aggregated data, use CDN to offload traffic from aggregator.
Back-of-Envelope Cost Analysis
  • At 10,000 users: ~20,000 requests/sec (assuming 2 requests per user per second)
  • Aggregator CPU: Each instance handles ~5,000 concurrent requests; need ~4 instances minimum
  • Memory: Aggregation buffers and response assembly require sufficient RAM (e.g., 4-8GB per instance)
  • Network Bandwidth: 1 Gbps (~125 MB/s) per server; ensure aggregator instances have enough bandwidth for combined microservice responses
  • Storage: Minimal for aggregation itself; caching layer may require fast in-memory stores like Redis
Interview Tip

Start by explaining the aggregation flow and identify the component that combines multiple microservice responses. Discuss how load increases with users and which resource (CPU, memory, network) will saturate first. Then propose scaling strategies step-by-step, justifying each based on the bottleneck. Finally, mention trade-offs like latency vs consistency and caching freshness.

Self Check Question

Your aggregator service handles 1000 queries per second. Traffic grows 10x to 10,000 QPS. What is your first action and why?

Key Result
The aggregator service CPU and memory become the first bottleneck as request volume grows; horizontal scaling and caching are the primary solutions to maintain low latency and handle increased load.

Practice

(1/5)
1. What is the main purpose of request aggregation in microservices?
easy
A. To cache responses from a single microservice
B. To split a large service into smaller microservices
C. To handle database transactions across services
D. To combine data from multiple microservices into a single response

Solution

  1. Step 1: Understand request aggregation concept

    Request aggregation means collecting data from multiple microservices to form one combined response.
  2. Step 2: Identify the main goal

    The goal is to reduce multiple client calls into one, improving efficiency and user experience.
  3. Final Answer:

    To combine data from multiple microservices into a single response -> Option D
  4. Quick Check:

    Request aggregation = combine multiple responses [OK]
Hint: Aggregation means combining multiple service responses [OK]
Common Mistakes:
  • Confusing aggregation with service splitting
  • Thinking it only caches data
  • Mixing aggregation with transaction management
2. Which of the following is the correct way to implement a request aggregator in a microservices architecture?
easy
A. Make parallel calls to all required microservices and aggregate responses asynchronously
B. Make sequential calls to each microservice and combine results synchronously
C. Call only one microservice and ignore others
D. Use a database trigger to combine data from microservices

Solution

  1. Step 1: Review aggregator call patterns

    Efficient aggregators call multiple services in parallel to reduce total wait time.
  2. Step 2: Identify correct implementation

    Parallel asynchronous calls improve performance and user experience compared to sequential calls.
  3. Final Answer:

    Make parallel calls to all required microservices and aggregate responses asynchronously -> Option A
  4. Quick Check:

    Parallel async calls = best aggregator practice [OK]
Hint: Use parallel async calls for faster aggregation [OK]
Common Mistakes:
  • Using sequential calls causing slow responses
  • Ignoring some microservices in aggregation
  • Trying to use database triggers for aggregation
3. Consider this pseudocode for a request aggregator:
async function aggregate() {
  const user = await getUser();
  const orders = await getOrders(user.id);
  const payments = await getPayments(user.id);
  return { user, orders, payments };
}
What is the main problem with this code?
medium
A. It does not handle errors from getUser
B. It calls getOrders and getPayments sequentially, increasing total response time
C. It returns data in the wrong format
D. It calls getUser multiple times unnecessarily

Solution

  1. Step 1: Analyze call sequence

    The code waits for getUser, then calls getOrders and waits, then calls getPayments and waits, all sequentially.
  2. Step 2: Identify inefficiency

    Calling getOrders and getPayments one after another increases total wait time unnecessarily.
  3. Final Answer:

    It calls getOrders and getPayments sequentially, increasing total response time -> Option B
  4. Quick Check:

    Sequential calls = slower aggregation [OK]
Hint: Parallelize independent calls to reduce wait time [OK]
Common Mistakes:
  • Assuming error handling is missing
  • Thinking return format is incorrect
  • Believing getUser is called multiple times
4. You have a request aggregator that calls three microservices in parallel. Sometimes, one service fails and causes the whole aggregation to fail. How can you fix this?
medium
A. Cache the failed service response permanently
B. Retry the failed service indefinitely until it succeeds
C. Ignore errors and return partial data with error info for failed services
D. Stop calling other services if one fails

Solution

  1. Step 1: Understand error impact in aggregation

    If one service fails, the aggregator should still return available data to avoid full failure.
  2. Step 2: Choose error handling strategy

    Returning partial data with error info improves user experience and system resilience.
  3. Final Answer:

    Ignore errors and return partial data with error info for failed services -> Option C
  4. Quick Check:

    Partial data + error info = robust aggregation [OK]
Hint: Return partial results with errors, don't fail whole aggregation [OK]
Common Mistakes:
  • Retrying endlessly causing delays
  • Stopping all calls on one failure
  • Caching errors permanently causing stale data
5. You design a request aggregator for a shopping app that calls user, orders, and payment microservices. To improve scalability, which design choice is best?
hard
A. Use asynchronous parallel calls with timeout and fallback data for each microservice
B. Call microservices sequentially and cache all responses for 24 hours
C. Aggregate data in a single monolithic service instead of microservices
D. Make synchronous calls and block until all microservices respond

Solution

  1. Step 1: Consider scalability needs

    Parallel async calls reduce latency and improve throughput under load.
  2. Step 2: Add timeout and fallback

    Timeouts prevent long waits; fallback data keeps user experience smooth if a service is slow or down.
  3. Step 3: Evaluate other options

    Sequential calls and long caching reduce freshness and responsiveness; monolith loses microservices benefits; synchronous blocking hurts scalability.
  4. Final Answer:

    Use asynchronous parallel calls with timeout and fallback data for each microservice -> Option A
  5. Quick Check:

    Async parallel + timeout + fallback = scalable aggregator [OK]
Hint: Combine async calls with timeout and fallback for best scalability [OK]
Common Mistakes:
  • Using sequential calls causing slow response
  • Relying on stale cached data too long
  • Ignoring microservices benefits by monolith design