$lookup with pipeline (advanced join) in MongoDB - Time & Space Complexity
When using $lookup with a pipeline in MongoDB, we want to understand how the time it takes grows as the data gets bigger.
We ask: How does the number of operations change when joining collections with a pipeline?
Analyze the time complexity of the following code snippet.
db.orders.aggregate([
{
$lookup: {
from: "products",
let: { order_item: "$item" },
pipeline: [
{ $match: { $expr: { $eq: ["$name", "$$order_item"] } } },
{ $project: { price: 1, _id: 0 } }
],
as: "productDetails"
}
}
])
This code joins the orders collection with products using a pipeline to match and project product details for each order item.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: For each document in
orders, the pipeline runs a$matchonproducts. - How many times: Once per
ordersdocument, so the number ofordersdocuments times the cost of matching inproducts.
As the number of orders grows, the pipeline runs more times. Each run searches products for matches.
| Input Size (orders) | Approx. Operations |
|---|---|
| 10 | 10 x cost of matching in products |
| 100 | 100 x cost of matching in products |
| 1000 | 1000 x cost of matching in products |
Pattern observation: The total work grows roughly in direct proportion to the number of orders documents.
Time Complexity: O(n x m)
This means the time grows with the number of orders (n) times the number of products (m) scanned per lookup.
[X] Wrong: "The $lookup with pipeline runs just once regardless of input size."
[OK] Correct: Actually, the pipeline runs once for each orders document, so the work multiplies with input size.
Understanding how $lookup with pipelines scales helps you explain data joining costs clearly, a useful skill when discussing database performance.
What if we added an index on the products.name field used in $match? How would the time complexity change?