Why advanced stages matter in MongoDB - Performance Analysis
When using MongoDB's aggregation, each stage processes data and passes results to the next. Understanding how time grows with each stage helps us see why advanced stages matter.
We want to know how adding more stages affects the total work done.
Analyze the time complexity of this aggregation pipeline with multiple stages.
db.collection.aggregate([
{ $match: { status: "active" } },
{ $group: { _id: "$category", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } },
{ $limit: 5 }
])
This pipeline filters documents, groups them by category summing amounts, sorts by total, and limits output to top 5.
Look at what repeats as data flows through stages.
- Primary operation: Scanning documents in $match and grouping in $group.
- How many times: Each document is processed once per stage until filtered out.
As the number of documents grows, each stage processes more data, increasing total work.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations per stage |
| 100 | About 100 operations per stage |
| 1000 | About 1000 operations per stage |
Pattern observation: Work grows roughly in direct proportion to input size for each stage.
Time Complexity: O(n log n)
This means the total work grows roughly in proportion to n log n due to the $sort stage.
[X] Wrong: "Adding more stages doesn't affect performance much because each stage only does a small task."
[OK] Correct: Each stage processes data, so more stages mean more passes over data, increasing total work.
Understanding how each stage adds to total work helps you explain real MongoDB queries clearly and shows you think about efficiency in data processing.
"What if we added an extra $project stage before $group? How would that change the time complexity?"