YARN scheduling policies in Hadoop - Time & Space Complexity
We want to understand how the time to schedule tasks grows as more jobs and resources are involved in YARN.
How does the scheduler handle more work and what affects its speed?
Analyze the time complexity of the YARN Capacity Scheduler's allocation loop.
for each queue in queues:
for each application in queue.applications:
for each container request in application.requests:
allocate container if resources available
This code tries to allocate containers to applications in each queue based on resource availability.
Look at the loops that repeat work:
- Primary operation: Nested loops over queues, applications, and container requests.
- How many times: Number of queues (q), times number of applications per queue (a), times container requests per application (r).
As the number of queues, applications, and requests grow, the scheduler does more work.
| Input Size (q, a, r) | Approx. Operations |
|---|---|
| 10 queues, 5 apps, 3 requests | 10 * 5 * 3 = 150 |
| 100 queues, 10 apps, 5 requests | 100 * 10 * 5 = 5,000 |
| 1000 queues, 20 apps, 10 requests | 1000 * 20 * 10 = 200,000 |
Pattern observation: The work grows quickly as more queues, applications, or requests are added.
Time Complexity: O(q * a * r)
This means the scheduling time grows proportionally to the number of queues, applications, and container requests.
[X] Wrong: "The scheduler time only depends on the number of applications."
[OK] Correct: The scheduler also loops over queues and container requests, so all three affect the time.
Understanding how scheduling time grows helps you design better resource managers and explain system behavior clearly.
"What if the scheduler used a priority queue to pick applications instead of looping over all? How would the time complexity change?"