Joins vs embedding decision in MongoDB - Performance Comparison
Start learning this pattern below
Jump into concepts and practice - no test required
When working with MongoDB, choosing between joins and embedding affects how fast queries run.
We want to understand how the time to get data changes as the data grows.
Analyze the time complexity of these two ways to get related data.
// Using embedding
db.orders.find({ _id: orderId })
// Using join (lookup)
db.orders.aggregate([
{ $match: { _id: orderId } },
{ $lookup: {
from: 'products',
localField: 'productIds',
foreignField: '_id',
as: 'products'
}}
])
The first gets order and products inside it directly. The second joins orders with products collection.
Look at what repeats when running these queries.
- Primary operation: For embedding, a single document fetch; for join, matching plus scanning related product documents.
- How many times: Embedding fetches one document; join scans all related product IDs to find matches.
As the number of related products grows, the work changes differently.
| Input Size (number of related products) | Approx. Operations |
|---|---|
| 10 | Embedding: 1 fetch; Join: 10 lookups |
| 100 | Embedding: 1 fetch; Join: 100 lookups |
| 1000 | Embedding: 1 fetch; Join: 1000 lookups |
Pattern observation: Embedding stays constant; join work grows with number of related items.
Time Complexity: O(n) where n is the number of related documents in join.
This means fetching embedded data stays fast no matter size, but joining takes longer as related data grows.
[X] Wrong: "Joins are always slow and embedding is always better."
[OK] Correct: Embedding can cause large documents that slow writes and use more memory; joins can be efficient if related data is large or changes often.
Understanding how data structure affects query speed shows you can design databases that work well as data grows.
"What if we indexed the foreignField in the join? How would the time complexity change?"
Practice
Solution
Step 1: Understand embedding use case
Embedding stores related data inside one document for fast access and atomic updates.Step 2: Match scenario to embedding benefits
If data is accessed together and rarely changes, embedding avoids extra lookups and is efficient.Final Answer:
When related data is frequently accessed together and rarely changes -> Option BQuick Check:
Embedding = fast access, rare changes [OK]
- Embedding large, frequently changing data
- Embedding data shared across many documents
- Confusing embedding with referencing
Solution
Step 1: Identify referencing syntax
Referencing stores the ObjectId of another document to link collections.Step 2: Match correct reference format
Storing the ObjectId directly (e.g., user_id: ObjectId('abc123')) is the standard referencing method.Final Answer:
{ user_id: ObjectId('abc123') } inside the document -> Option CQuick Check:
Reference = store ObjectId [OK]
- Embedding full document instead of referencing
- Using deprecated $ref and $id fields
- Storing plain strings instead of ObjectId
orders with embedded items array, what is the main benefit of embedding items inside orders?Solution
Step 1: Understand embedding effect on queries
Embedding items inside orders means all item data is in one document.Step 2: Identify benefit of embedding items
This allows fetching an order and its items in a single query, improving speed.Final Answer:
Faster retrieval of all items for an order without extra queries -> Option AQuick Check:
Embedding = single query fetch [OK]
- Thinking embedding reduces document size
- Assuming embedded data can be reused easily
- Expecting automatic foreign key enforcement
Solution
Step 1: Identify problem with embedding frequent updates
Embedding addresses means updating user documents often, which can be slow and large.Step 2: Choose solution for frequent changing data
Referencing addresses separately allows updating addresses independently without rewriting user documents.Final Answer:
Switch to referencing addresses in a separate collection -> Option AQuick Check:
Frequent updates = use referencing [OK]
- Adding indexes without fixing schema design
- Embedding more fields increases document size
- Increasing document size limit doesn't improve update speed
Solution
Step 1: Analyze comment characteristics
Comments can be many and need independent editing, so they change often and grow large.Step 2: Choose schema design for many, editable comments
Referencing comments in a separate collection allows independent updates and avoids large post documents.Final Answer:
Store comments in a separate collection and reference post ID -> Option DQuick Check:
Many editable items = referencing best [OK]
- Embedding many comments causes large documents
- Embedding only latest comment complicates queries
- Storing comments as plain text fields loses structure
