When to Embed vs Reference in MongoDB: Key Differences and Usage
embedding in MongoDB when related data is frequently accessed together and the data size is small, ensuring faster reads. Use referencing when data is large, shared across documents, or updated independently to keep data normalized and avoid duplication.Quick Comparison
Here is a quick comparison of embedding vs referencing in MongoDB based on key factors.
| Factor | Embedding | Referencing |
|---|---|---|
| Data Size | Small, fits within document size limit | Can be large or grow over time |
| Data Access | Accessed together frequently | Accessed separately or less often |
| Data Duplication | Duplicates data if repeated | No duplication, single source of truth |
| Update Frequency | Updated together | Updated independently |
| Query Performance | Faster reads, fewer queries | Requires joins/lookups, more queries |
| Data Consistency | Easier to maintain consistency | Requires manual consistency management |
Key Differences
Embedding stores related data inside the same document. This is great when you want to retrieve all related information in one go, like a blog post with its comments. It reduces the number of queries and speeds up reads but can increase document size and cause duplication if the embedded data repeats.
Referencing stores related data in separate documents and links them using IDs. This keeps data normalized and avoids duplication, which is useful when the related data is large or shared across many documents, like users referenced by many posts. However, it requires extra queries or $lookup operations to join data, which can slow down reads.
Choosing between embedding and referencing depends on your application's data access patterns, update frequency, and size constraints. Embedding favors read speed and simplicity, while referencing favors data integrity and flexibility.
Code Comparison
Example of embedding comments inside a blog post document.
db.posts.insertOne({
title: "My First Post",
content: "Hello world!",
comments: [
{ user: "Alice", message: "Great post!" },
{ user: "Bob", message: "Thanks for sharing." }
]
})
// Query to get post with comments
const post = db.posts.findOne({ title: "My First Post" })
printjson(post)Referencing Equivalent
Example of referencing comments in a separate collection linked by post ID.
db.posts.insertOne({
_id: ObjectId("post1"),
title: "My First Post",
content: "Hello world!"
})
db.comments.insertMany([
{ postId: ObjectId("post1"), user: "Alice", message: "Great post!" },
{ postId: ObjectId("post1"), user: "Bob", message: "Thanks for sharing." }
])
// Query to get post
const post = db.posts.findOne({ _id: ObjectId("post1") })
// Query to get comments for the post
const comments = db.comments.find({ postId: ObjectId("post1") }).toArray()
printjson(post)
printjson(comments)When to Use Which
Choose embedding when related data is small, accessed together, and updated at the same time, like user profile details or product reviews inside a product document. Embedding improves read performance and simplifies queries.
Choose referencing when related data is large, shared among many documents, or updated independently, such as users referenced by many posts or orders referencing products. Referencing keeps data normalized and avoids duplication but requires additional queries.
Always consider your application's read/write patterns and data size limits before deciding.