Embedding vs referencing decision in MongoDB - Performance Comparison
When choosing between embedding and referencing in MongoDB, it's important to understand how the time to get data changes as your data grows.
We want to know how the way data is stored affects how long queries take when the amount of data increases.
Analyze the time complexity of fetching related data using embedding vs referencing.
// Embedding example
const user = db.users.findOne({ _id: userId });
// user document contains embedded posts array
// Referencing example
const user = db.users.findOne({ _id: userId });
const posts = db.posts.find({ userId: user._id }).toArray();
This code shows two ways to get a user's posts: either embedded inside the user document or stored separately and linked by userId.
Look at what repeats when fetching posts for a user.
- Primary operation: Reading posts data either from embedded array or separate collection.
- How many times: For embedding, posts are read once inside user document. For referencing, a separate query fetches all posts for the user.
Consider how the number of posts (n) affects query time.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 posts | Embedding: 1 read including 10 posts; Referencing: 1 user read + 1 query returning 10 posts |
| 100 posts | Embedding: 1 read including 100 posts; Referencing: 1 user read + 1 query returning 100 posts |
| 1000 posts | Embedding: 1 read including 1000 posts; Referencing: 1 user read + 1 query returning 1000 posts |
Pattern observation: Both methods read all posts, but embedding reads them all at once inside one document, while referencing requires a separate query that grows with the number of posts.
Time Complexity: O(n)
This means the time to fetch posts grows linearly with the number of posts, whether embedded or referenced.
[X] Wrong: "Embedding always makes queries faster because all data is in one place."
[OK] Correct: If the embedded data grows very large, reading the whole document can be slow and use more memory. Referencing can be better for very large or frequently changing related data.
Understanding how embedding and referencing affect query time helps you explain design choices clearly and shows you think about how data size impacts performance.
"What if we added an index on the referencing field? How would that change the time complexity of fetching posts?"