Bird
Raised Fist0
MongoDBquery~5 mins

Joins vs embedding decision in MongoDB - Performance Comparison

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Time Complexity: Joins vs embedding decision
O(n)
Understanding Time Complexity

When working with MongoDB, choosing between joins and embedding affects how fast queries run.

We want to understand how the time to get data changes as the data grows.

Scenario Under Consideration

Analyze the time complexity of these two ways to get related data.


// Using embedding
db.orders.find({ _id: orderId })

// Using join (lookup)
db.orders.aggregate([
  { $match: { _id: orderId } },
  { $lookup: {
      from: 'products',
      localField: 'productIds',
      foreignField: '_id',
      as: 'products'
  }}
])
    

The first gets order and products inside it directly. The second joins orders with products collection.

Identify Repeating Operations

Look at what repeats when running these queries.

  • Primary operation: For embedding, a single document fetch; for join, matching plus scanning related product documents.
  • How many times: Embedding fetches one document; join scans all related product IDs to find matches.
How Execution Grows With Input

As the number of related products grows, the work changes differently.

Input Size (number of related products)Approx. Operations
10Embedding: 1 fetch; Join: 10 lookups
100Embedding: 1 fetch; Join: 100 lookups
1000Embedding: 1 fetch; Join: 1000 lookups

Pattern observation: Embedding stays constant; join work grows with number of related items.

Final Time Complexity

Time Complexity: O(n) where n is the number of related documents in join.

This means fetching embedded data stays fast no matter size, but joining takes longer as related data grows.

Common Mistake

[X] Wrong: "Joins are always slow and embedding is always better."

[OK] Correct: Embedding can cause large documents that slow writes and use more memory; joins can be efficient if related data is large or changes often.

Interview Connect

Understanding how data structure affects query speed shows you can design databases that work well as data grows.

Self-Check

"What if we indexed the foreignField in the join? How would the time complexity change?"

Practice

(1/5)
1. Which scenario is best suited for embedding related data in MongoDB?
easy
A. When related data is large and changes frequently
B. When related data is frequently accessed together and rarely changes
C. When data needs to be shared across many documents
D. When you want to enforce strict relational constraints

Solution

  1. Step 1: Understand embedding use case

    Embedding stores related data inside one document for fast access and atomic updates.
  2. Step 2: Match scenario to embedding benefits

    If data is accessed together and rarely changes, embedding avoids extra lookups and is efficient.
  3. Final Answer:

    When related data is frequently accessed together and rarely changes -> Option B
  4. Quick Check:

    Embedding = fast access, rare changes [OK]
Hint: Embed when data is read together and changes rarely [OK]
Common Mistakes:
  • Embedding large, frequently changing data
  • Embedding data shared across many documents
  • Confusing embedding with referencing
2. Which of the following is the correct way to reference another document in MongoDB?
easy
A. { user: { $ref: 'users', $id: ObjectId('abc123') } }
B. { embedded_user: { name: 'Alice' } } inside the document
C. { user_id: ObjectId('abc123') } inside the document
D. { user: 'Alice' } as a string

Solution

  1. Step 1: Identify referencing syntax

    Referencing stores the ObjectId of another document to link collections.
  2. Step 2: Match correct reference format

    Storing the ObjectId directly (e.g., user_id: ObjectId('abc123')) is the standard referencing method.
  3. Final Answer:

    { user_id: ObjectId('abc123') } inside the document -> Option C
  4. Quick Check:

    Reference = store ObjectId [OK]
Hint: Reference by storing ObjectId, not embedding full data [OK]
Common Mistakes:
  • Embedding full document instead of referencing
  • Using deprecated $ref and $id fields
  • Storing plain strings instead of ObjectId
3. Given two collections: orders with embedded items array, what is the main benefit of embedding items inside orders?
medium
A. Faster retrieval of all items for an order without extra queries
B. Ability to reuse items across multiple orders easily
C. Smaller document size for orders collection
D. Enforcing foreign key constraints automatically

Solution

  1. Step 1: Understand embedding effect on queries

    Embedding items inside orders means all item data is in one document.
  2. Step 2: Identify benefit of embedding items

    This allows fetching an order and its items in a single query, improving speed.
  3. Final Answer:

    Faster retrieval of all items for an order without extra queries -> Option A
  4. Quick Check:

    Embedding = single query fetch [OK]
Hint: Embedding avoids extra queries for related data [OK]
Common Mistakes:
  • Thinking embedding reduces document size
  • Assuming embedded data can be reused easily
  • Expecting automatic foreign key enforcement
4. You have a MongoDB schema where user profiles embed their addresses. You notice address updates are frequent and slow. What is the best fix?
medium
A. Switch to referencing addresses in a separate collection
B. Embed more fields inside the address document
C. Increase the document size limit
D. Add indexes on embedded address fields

Solution

  1. Step 1: Identify problem with embedding frequent updates

    Embedding addresses means updating user documents often, which can be slow and large.
  2. Step 2: Choose solution for frequent changing data

    Referencing addresses separately allows updating addresses independently without rewriting user documents.
  3. Final Answer:

    Switch to referencing addresses in a separate collection -> Option A
  4. Quick Check:

    Frequent updates = use referencing [OK]
Hint: Use referencing for frequently updated data [OK]
Common Mistakes:
  • Adding indexes without fixing schema design
  • Embedding more fields increases document size
  • Increasing document size limit doesn't improve update speed
5. You design a blogging platform where posts have comments. Comments can be many and users want to edit them independently. Which design is best?
hard
A. Embed all comments inside each post document
B. Store comments as plain text fields inside post
C. Embed only the latest comment inside post, others referenced
D. Store comments in a separate collection and reference post ID

Solution

  1. Step 1: Analyze comment characteristics

    Comments can be many and need independent editing, so they change often and grow large.
  2. Step 2: Choose schema design for many, editable comments

    Referencing comments in a separate collection allows independent updates and avoids large post documents.
  3. Final Answer:

    Store comments in a separate collection and reference post ID -> Option D
  4. Quick Check:

    Many editable items = referencing best [OK]
Hint: Many changing items = use referencing, not embedding [OK]
Common Mistakes:
  • Embedding many comments causes large documents
  • Embedding only latest comment complicates queries
  • Storing comments as plain text fields loses structure