Normalization vs denormalization default in MongoDB - Performance Comparison
Start learning this pattern below
Jump into concepts and practice - no test required
When working with databases, how data is organized affects how fast queries run.
We want to see how time to get data changes when using normalized or denormalized data in MongoDB.
Analyze the time complexity of fetching user orders in two ways: normalized and denormalized.
// Normalized: separate collections
const user = db.users.findOne({ _id: userId });
const orders = db.orders.find({ userId: user._id }).toArray();
// Denormalized: embedded orders
const userWithOrders = db.users.findOne({ _id: userId });
const orders = userWithOrders.orders;
This code shows fetching orders separately (normalized) versus embedded inside user (denormalized).
- Normalized primary operation: Querying orders collection for matching userId.
- Normalized how many times: Once per user, but scanning orders that belong to user.
- Denormalized primary operation: Single query to users collection, then access embedded orders array.
- Denormalized how many times: One query, no extra scans.
As the number of orders grows, how does query time change?
| Input Size (orders per user) | Normalized Approx. Operations | Denormalized Approx. Operations |
|---|---|---|
| 10 | Scan 10 orders in orders collection | Access 10 embedded orders |
| 100 | Scan 100 orders in orders collection | Access 100 embedded orders |
| 1000 | Scan 1000 orders in orders collection | Access 1000 embedded orders |
Pattern observation: Both grow roughly linearly with number of orders, but normalized requires separate query scanning orders collection.
Time Complexity: O(n)
This means the time to fetch orders grows linearly with how many orders a user has, whether normalized or denormalized.
[X] Wrong: "Denormalized data always makes queries faster regardless of data size."
[OK] Correct: Large embedded arrays can slow down queries and updates, so time can still grow with data size.
Understanding how data layout affects query time helps you design better databases and answer real questions about performance.
"What if we added an index on userId in the orders collection? How would that change the time complexity for normalized queries?"
Practice
Solution
Step 1: Understand normalization concept
Normalization means splitting data into separate collections and linking them by references.Step 2: Identify the main benefit
This separation makes updating data easier because changes happen in one place without duplication.Final Answer:
It separates data into collections linked by references for easy updates. -> Option AQuick Check:
Normalization = separate collections + easy updates [OK]
- Confusing normalization with denormalization
- Thinking normalization duplicates data
- Assuming normalization speeds up reads
Solution
Step 1: Identify denormalized structure
Denormalization stores related data together inside one document, like embedding orders inside user.Step 2: Check options for embedded data
{ _id: 1, name: 'Alice', orders: [ { orderId: 101, item: 'Book' } ] } embeds orders array inside the user document, showing denormalization.Final Answer:
{ _id: 1, name: 'Alice', orders: [ { orderId: 101, item: 'Book' } ] } -> Option BQuick Check:
Denormalization = embedded related data [OK]
- Choosing separate collections as denormalized
- Ignoring embedded arrays as denormalization
- Confusing null fields with embedded data
users: { _id: 1, name: 'Bob' }orders: { _id: 101, userId: 1, item: 'Pen' }What is the main drawback of this normalized design when reading user orders?
Solution
Step 1: Understand normalized design
Users and orders are in separate collections linked by userId reference.Step 2: Identify drawback when reading
To get all orders for a user, you must query orders collection filtering by userId, requiring multiple queries or aggregation.Final Answer:
It requires multiple queries or a join-like operation to get all orders for a user. -> Option AQuick Check:
Normalized read = multiple queries [OK]
- Thinking normalized data duplicates info
- Assuming all data is embedded in one document
- Believing updates are harder in normalized data
{ _id: 1, name: 'Carol', orders: [ { orderId: 201, item: 'Notebook' } ] }Which problem can occur if you update the item name in one order but forget to update it elsewhere?
Solution
Step 1: Recognize denormalization risk
Denormalization duplicates related data inside documents, so the same order info may appear in many places.Step 2: Understand update problem
If you update one copy but not others, data becomes inconsistent and unreliable.Final Answer:
Data inconsistency due to duplicated order info in multiple documents. -> Option DQuick Check:
Denormalization risk = data inconsistency [OK]
- Thinking denormalization slows queries
- Believing schema changes automatically
- Confusing index loss with denormalization
Users have many posts, and posts rarely change after creation.
Which design is best for fast reading and why?
Options:
A: Store users and posts in separate collections (normalized).B: Embed all posts inside each user document (denormalized).C: Duplicate posts in both users and posts collections.D: Store posts only, with user info duplicated in each post.Solution
Step 1: Analyze data change frequency
Posts rarely change, so embedding them inside users won't cause frequent update problems.Step 2: Choose design for fast reads
Embedding posts inside user documents allows fetching user and posts in one read, improving read speed.Step 3: Compare options
Embedding posts inside user documents for fast reads since posts rarely change fits best for fast reads with rare updates; separate collections require joins; duplicating posts in both risks inconsistency; storing posts only duplicates user info unnecessarily.Final Answer:
Embed posts inside user documents for fast reads since posts rarely change. -> Option CQuick Check:
Denormalization + rare updates = embed for fast reads [OK]
- Choosing normalization for fast reads
- Duplicating data causing inconsistency
- Ignoring update frequency in design
