Bird
Raised Fist0
MongoDBquery~30 mins

Joins vs embedding decision in MongoDB - Hands-On Comparison

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Joins vs Embedding Decision in MongoDB
📖 Scenario: You are building a simple online bookstore database using MongoDB. You need to decide how to organize your data for books and their authors. You want to practice creating collections and deciding when to embed data or reference it (similar to joins).
🎯 Goal: Build two collections: authors and books. Practice embedding author details inside books for quick access, and also practice referencing authors by ID to simulate a join.
📋 What You'll Learn
Create an authors collection with exactly two authors with specified fields.
Create a books collection with three books, embedding author info in one book and referencing author IDs in others.
Use MongoDB insert statements with exact field names and values.
Demonstrate a query that uses $lookup to join books with authors by reference.
💡 Why This Matters
🌍 Real World
Online bookstores and many other applications need to decide between embedding related data or referencing it to balance performance and data consistency.
💼 Career
Understanding when to embed or reference data in MongoDB is a key skill for backend developers and database administrators working with NoSQL databases.
Progress0 / 4 steps
1
Create the authors collection
Create a collection called authors and insert exactly two documents with these fields and values: { _id: 1, name: "Jane Austen", country: "UK" } and { _id: 2, name: "Mark Twain", country: "USA" }.
MongoDB
Hint

Use insertMany on db.authors with an array of two author objects.

2
Create the books collection with embedded author
Create a collection called books and insert one document for the book "Pride and Prejudice" with these fields: { title: "Pride and Prejudice", year: 1813, author: { name: "Jane Austen", country: "UK" } }. Embed the author details inside the book document.
MongoDB
Hint

Use insertOne on db.books with the book document embedding the author object.

3
Insert books referencing authors by ID
Insert two more documents into the books collection for the books "Adventures of Huckleberry Finn" (year 1884) and "Emma" (year 1815). Instead of embedding, reference the authors by their _id using the field author_id with values 2 and 1 respectively.
MongoDB
Hint

Use insertMany on db.books with two book documents referencing authors by author_id.

4
Query books with author details using $lookup
Write an aggregation query on books that uses $lookup to join the authors collection on author_id and _id. The query should add a field author_info with the matching author document.
MongoDB
Hint

Use db.books.aggregate with a $lookup stage specifying from, localField, foreignField, and as.

Practice

(1/5)
1. Which scenario is best suited for embedding related data in MongoDB?
easy
A. When related data is large and changes frequently
B. When related data is frequently accessed together and rarely changes
C. When data needs to be shared across many documents
D. When you want to enforce strict relational constraints

Solution

  1. Step 1: Understand embedding use case

    Embedding stores related data inside one document for fast access and atomic updates.
  2. Step 2: Match scenario to embedding benefits

    If data is accessed together and rarely changes, embedding avoids extra lookups and is efficient.
  3. Final Answer:

    When related data is frequently accessed together and rarely changes -> Option B
  4. Quick Check:

    Embedding = fast access, rare changes [OK]
Hint: Embed when data is read together and changes rarely [OK]
Common Mistakes:
  • Embedding large, frequently changing data
  • Embedding data shared across many documents
  • Confusing embedding with referencing
2. Which of the following is the correct way to reference another document in MongoDB?
easy
A. { user: { $ref: 'users', $id: ObjectId('abc123') } }
B. { embedded_user: { name: 'Alice' } } inside the document
C. { user_id: ObjectId('abc123') } inside the document
D. { user: 'Alice' } as a string

Solution

  1. Step 1: Identify referencing syntax

    Referencing stores the ObjectId of another document to link collections.
  2. Step 2: Match correct reference format

    Storing the ObjectId directly (e.g., user_id: ObjectId('abc123')) is the standard referencing method.
  3. Final Answer:

    { user_id: ObjectId('abc123') } inside the document -> Option C
  4. Quick Check:

    Reference = store ObjectId [OK]
Hint: Reference by storing ObjectId, not embedding full data [OK]
Common Mistakes:
  • Embedding full document instead of referencing
  • Using deprecated $ref and $id fields
  • Storing plain strings instead of ObjectId
3. Given two collections: orders with embedded items array, what is the main benefit of embedding items inside orders?
medium
A. Faster retrieval of all items for an order without extra queries
B. Ability to reuse items across multiple orders easily
C. Smaller document size for orders collection
D. Enforcing foreign key constraints automatically

Solution

  1. Step 1: Understand embedding effect on queries

    Embedding items inside orders means all item data is in one document.
  2. Step 2: Identify benefit of embedding items

    This allows fetching an order and its items in a single query, improving speed.
  3. Final Answer:

    Faster retrieval of all items for an order without extra queries -> Option A
  4. Quick Check:

    Embedding = single query fetch [OK]
Hint: Embedding avoids extra queries for related data [OK]
Common Mistakes:
  • Thinking embedding reduces document size
  • Assuming embedded data can be reused easily
  • Expecting automatic foreign key enforcement
4. You have a MongoDB schema where user profiles embed their addresses. You notice address updates are frequent and slow. What is the best fix?
medium
A. Switch to referencing addresses in a separate collection
B. Embed more fields inside the address document
C. Increase the document size limit
D. Add indexes on embedded address fields

Solution

  1. Step 1: Identify problem with embedding frequent updates

    Embedding addresses means updating user documents often, which can be slow and large.
  2. Step 2: Choose solution for frequent changing data

    Referencing addresses separately allows updating addresses independently without rewriting user documents.
  3. Final Answer:

    Switch to referencing addresses in a separate collection -> Option A
  4. Quick Check:

    Frequent updates = use referencing [OK]
Hint: Use referencing for frequently updated data [OK]
Common Mistakes:
  • Adding indexes without fixing schema design
  • Embedding more fields increases document size
  • Increasing document size limit doesn't improve update speed
5. You design a blogging platform where posts have comments. Comments can be many and users want to edit them independently. Which design is best?
hard
A. Embed all comments inside each post document
B. Store comments as plain text fields inside post
C. Embed only the latest comment inside post, others referenced
D. Store comments in a separate collection and reference post ID

Solution

  1. Step 1: Analyze comment characteristics

    Comments can be many and need independent editing, so they change often and grow large.
  2. Step 2: Choose schema design for many, editable comments

    Referencing comments in a separate collection allows independent updates and avoids large post documents.
  3. Final Answer:

    Store comments in a separate collection and reference post ID -> Option D
  4. Quick Check:

    Many editable items = referencing best [OK]
Hint: Many changing items = use referencing, not embedding [OK]
Common Mistakes:
  • Embedding many comments causes large documents
  • Embedding only latest comment complicates queries
  • Storing comments as plain text fields loses structure