0
0
MongodbComparisonBeginner · 4 min read

When to Embed vs Reference in MongoDB: Key Differences and Usage

Use embedding in MongoDB when related data is frequently accessed together and the data size is small, ensuring faster reads. Use referencing when data is large, shared across documents, or updated independently to keep data normalized and avoid duplication.
⚖️

Quick Comparison

Here is a quick comparison of embedding vs referencing in MongoDB based on key factors.

FactorEmbeddingReferencing
Data SizeSmall, fits within document size limitCan be large or grow over time
Data AccessAccessed together frequentlyAccessed separately or less often
Data DuplicationDuplicates data if repeatedNo duplication, single source of truth
Update FrequencyUpdated togetherUpdated independently
Query PerformanceFaster reads, fewer queriesRequires joins/lookups, more queries
Data ConsistencyEasier to maintain consistencyRequires manual consistency management
⚖️

Key Differences

Embedding stores related data inside the same document. This is great when you want to retrieve all related information in one go, like a blog post with its comments. It reduces the number of queries and speeds up reads but can increase document size and cause duplication if the embedded data repeats.

Referencing stores related data in separate documents and links them using IDs. This keeps data normalized and avoids duplication, which is useful when the related data is large or shared across many documents, like users referenced by many posts. However, it requires extra queries or $lookup operations to join data, which can slow down reads.

Choosing between embedding and referencing depends on your application's data access patterns, update frequency, and size constraints. Embedding favors read speed and simplicity, while referencing favors data integrity and flexibility.

⚖️

Code Comparison

Example of embedding comments inside a blog post document.

mongodb
db.posts.insertOne({
  title: "My First Post",
  content: "Hello world!",
  comments: [
    { user: "Alice", message: "Great post!" },
    { user: "Bob", message: "Thanks for sharing." }
  ]
})

// Query to get post with comments
const post = db.posts.findOne({ title: "My First Post" })
printjson(post)
Output
{ "_id": ObjectId("..."), "title": "My First Post", "content": "Hello world!", "comments": [ { "user": "Alice", "message": "Great post!" }, { "user": "Bob", "message": "Thanks for sharing." } ] }
↔️

Referencing Equivalent

Example of referencing comments in a separate collection linked by post ID.

mongodb
db.posts.insertOne({
  _id: ObjectId("post1"),
  title: "My First Post",
  content: "Hello world!"
})

db.comments.insertMany([
  { postId: ObjectId("post1"), user: "Alice", message: "Great post!" },
  { postId: ObjectId("post1"), user: "Bob", message: "Thanks for sharing." }
])

// Query to get post
const post = db.posts.findOne({ _id: ObjectId("post1") })
// Query to get comments for the post
const comments = db.comments.find({ postId: ObjectId("post1") }).toArray()
printjson(post)
printjson(comments)
Output
{ "_id": ObjectId("post1"), "title": "My First Post", "content": "Hello world!" } [ { "postId": ObjectId("post1"), "user": "Alice", "message": "Great post!" }, { "postId": ObjectId("post1"), "user": "Bob", "message": "Thanks for sharing." } ]
🎯

When to Use Which

Choose embedding when related data is small, accessed together, and updated at the same time, like user profile details or product reviews inside a product document. Embedding improves read performance and simplifies queries.

Choose referencing when related data is large, shared among many documents, or updated independently, such as users referenced by many posts or orders referencing products. Referencing keeps data normalized and avoids duplication but requires additional queries.

Always consider your application's read/write patterns and data size limits before deciding.

Key Takeaways

Embed related data when it is small and accessed together for faster reads.
Reference data when it is large, shared, or updated independently to avoid duplication.
Embedding simplifies queries but can increase document size and duplication.
Referencing keeps data normalized but requires extra queries or lookups.
Choose based on your application's data access and update patterns.