Bird
Raised Fist0
MongoDBquery~10 mins

Joins vs embedding decision in MongoDB - Visual Side-by-Side Comparison

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Joins vs embedding decision
Start: Need related data
Decide data relation type
One-to-few
Embed data
Single doc
Query data accordingly
Decide if related data fits inside one document (embed) or needs separate documents linked by references (join). Then query accordingly.
Execution Sample
MongoDB
db.orders.aggregate([
  { $lookup: {
      from: 'products',
      localField: 'product_id',
      foreignField: '_id',
      as: 'product_info'
  }}
])
This query joins orders with products using $lookup to get product details inside each order.
Execution Table
StepActionInput DocumentLookup ResultOutput Document
1Read order document{_id: 1, product_id: 101, qty: 2}-{_id: 1, product_id: 101, qty: 2}
2Perform $lookup to find productproduct_id=101{_id: 101, name: 'Pen', price: 1.5}-
3Add product info to order--{_id: 1, product_id: 101, qty: 2, product_info: [{_id: 101, name: 'Pen', price: 1.5}]}
4Return enriched order document--{_id: 1, product_id: 101, qty: 2, product_info: [{_id: 101, name: 'Pen', price: 1.5}]}
5End of aggregation---
💡 All orders processed with product info joined using $lookup.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
orderDocnull{_id:1, product_id:101, qty:2}{_id:1, product_id:101, qty:2}{_id:1, product_id:101, qty:2, product_info:[{_id:101, name:'Pen', price:1.5}]}{_id:1, product_id:101, qty:2, product_info:[{_id:101, name:'Pen', price:1.5}]}
productInfonullnull{_id:101, name:'Pen', price:1.5}{_id:101, name:'Pen', price:1.5}{_id:101, name:'Pen', price:1.5}
Key Moments - 2 Insights
Why do we use $lookup instead of embedding product info directly?
Because product info can change or be large, embedding duplicates data and makes updates hard. $lookup joins data at query time, keeping data normalized. See execution_table step 2 where $lookup fetches product info.
When should we embed data instead of using $lookup?
Embed when related data is small and tightly connected, like one-to-few relationships, so reading one document gets all info without extra queries. This avoids $lookup overhead. The concept_flow shows embedding for one-to-few.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 3, what does the output document contain?
AOnly order fields without product info
BOnly product details without order fields
COrder fields plus product_info array with product details
DEmpty document
💡 Hint
Check the Output Document column at step 3 in execution_table.
At which step does the $lookup operation fetch product data?
AStep 1
BStep 2
CStep 4
DStep 5
💡 Hint
Look at the Action column in execution_table where $lookup is mentioned.
If product info was embedded in orders, how would the execution_table change?
ANo $lookup step needed, product info present from start
BMore $lookup steps needed
COutput documents would be empty
DExecution would stop earlier
💡 Hint
Refer to concept_flow where embedding avoids joins.
Concept Snapshot
Joins vs Embedding Decision in MongoDB:
- Embed data when related info is small and tightly linked (one-to-few).
- Use references and $lookup for large or many related items (one-to-many).
- Embedding improves read speed; joins keep data normalized.
- $lookup joins data at query time.
- Choose based on data size, update frequency, and query needs.
Full Transcript
This visual execution shows how MongoDB decides between embedding related data inside one document or using references with joins via $lookup. The flow starts by identifying the relation type. For small, tightly connected data, embedding is best. For large or many related items, references and $lookup joins are used. The sample aggregation query uses $lookup to join orders with product details. The execution table traces reading an order, performing $lookup to fetch product info, and adding it to the order document. Variables track the order document and product info as they change. Key moments clarify why $lookup is used instead of embedding and when embedding is preferred. The quiz tests understanding of the join steps and embedding impact. The snapshot summarizes the decision rules and trade-offs between embedding and joins in MongoDB.

Practice

(1/5)
1. Which scenario is best suited for embedding related data in MongoDB?
easy
A. When related data is large and changes frequently
B. When related data is frequently accessed together and rarely changes
C. When data needs to be shared across many documents
D. When you want to enforce strict relational constraints

Solution

  1. Step 1: Understand embedding use case

    Embedding stores related data inside one document for fast access and atomic updates.
  2. Step 2: Match scenario to embedding benefits

    If data is accessed together and rarely changes, embedding avoids extra lookups and is efficient.
  3. Final Answer:

    When related data is frequently accessed together and rarely changes -> Option B
  4. Quick Check:

    Embedding = fast access, rare changes [OK]
Hint: Embed when data is read together and changes rarely [OK]
Common Mistakes:
  • Embedding large, frequently changing data
  • Embedding data shared across many documents
  • Confusing embedding with referencing
2. Which of the following is the correct way to reference another document in MongoDB?
easy
A. { user: { $ref: 'users', $id: ObjectId('abc123') } }
B. { embedded_user: { name: 'Alice' } } inside the document
C. { user_id: ObjectId('abc123') } inside the document
D. { user: 'Alice' } as a string

Solution

  1. Step 1: Identify referencing syntax

    Referencing stores the ObjectId of another document to link collections.
  2. Step 2: Match correct reference format

    Storing the ObjectId directly (e.g., user_id: ObjectId('abc123')) is the standard referencing method.
  3. Final Answer:

    { user_id: ObjectId('abc123') } inside the document -> Option C
  4. Quick Check:

    Reference = store ObjectId [OK]
Hint: Reference by storing ObjectId, not embedding full data [OK]
Common Mistakes:
  • Embedding full document instead of referencing
  • Using deprecated $ref and $id fields
  • Storing plain strings instead of ObjectId
3. Given two collections: orders with embedded items array, what is the main benefit of embedding items inside orders?
medium
A. Faster retrieval of all items for an order without extra queries
B. Ability to reuse items across multiple orders easily
C. Smaller document size for orders collection
D. Enforcing foreign key constraints automatically

Solution

  1. Step 1: Understand embedding effect on queries

    Embedding items inside orders means all item data is in one document.
  2. Step 2: Identify benefit of embedding items

    This allows fetching an order and its items in a single query, improving speed.
  3. Final Answer:

    Faster retrieval of all items for an order without extra queries -> Option A
  4. Quick Check:

    Embedding = single query fetch [OK]
Hint: Embedding avoids extra queries for related data [OK]
Common Mistakes:
  • Thinking embedding reduces document size
  • Assuming embedded data can be reused easily
  • Expecting automatic foreign key enforcement
4. You have a MongoDB schema where user profiles embed their addresses. You notice address updates are frequent and slow. What is the best fix?
medium
A. Switch to referencing addresses in a separate collection
B. Embed more fields inside the address document
C. Increase the document size limit
D. Add indexes on embedded address fields

Solution

  1. Step 1: Identify problem with embedding frequent updates

    Embedding addresses means updating user documents often, which can be slow and large.
  2. Step 2: Choose solution for frequent changing data

    Referencing addresses separately allows updating addresses independently without rewriting user documents.
  3. Final Answer:

    Switch to referencing addresses in a separate collection -> Option A
  4. Quick Check:

    Frequent updates = use referencing [OK]
Hint: Use referencing for frequently updated data [OK]
Common Mistakes:
  • Adding indexes without fixing schema design
  • Embedding more fields increases document size
  • Increasing document size limit doesn't improve update speed
5. You design a blogging platform where posts have comments. Comments can be many and users want to edit them independently. Which design is best?
hard
A. Embed all comments inside each post document
B. Store comments as plain text fields inside post
C. Embed only the latest comment inside post, others referenced
D. Store comments in a separate collection and reference post ID

Solution

  1. Step 1: Analyze comment characteristics

    Comments can be many and need independent editing, so they change often and grow large.
  2. Step 2: Choose schema design for many, editable comments

    Referencing comments in a separate collection allows independent updates and avoids large post documents.
  3. Final Answer:

    Store comments in a separate collection and reference post ID -> Option D
  4. Quick Check:

    Many editable items = referencing best [OK]
Hint: Many changing items = use referencing, not embedding [OK]
Common Mistakes:
  • Embedding many comments causes large documents
  • Embedding only latest comment complicates queries
  • Storing comments as plain text fields loses structure