Denormalization and when to use it in SQL - Time & Space Complexity
When we denormalize a database, we add extra copies of data to reduce the number of joins needed in queries.
We want to understand how this affects the time it takes to run queries as data grows.
Analyze the time complexity of this query.
SELECT orders.id, orders.date, customers.name, customers.email
FROM orders
JOIN customers ON orders.customer_id = customers.id
WHERE orders.date > '2024-01-01';
This query fetches orders after a date, including customer details, using a join between two tables.
Look at what repeats as data grows.
- Primary operation: Scanning orders and joining with customers.
- How many times: Once per order after the date filter.
As the number of orders grows, the query checks more rows and joins more customers.
| Input Size (orders) | Approx. Operations |
|---|---|
| 10 | About 10 order checks and joins |
| 100 | About 100 order checks and joins |
| 1000 | About 1000 order checks and joins |
Pattern observation: The work grows roughly in direct proportion to the number of orders.
Time Complexity: O(n)
This means the time to run the query grows linearly with the number of orders.
[X] Wrong: "Denormalization always makes queries faster with no downsides."
[OK] Correct: Denormalization can speed up reads but may slow down updates and increase storage, so it's a trade-off.
Understanding when denormalization helps shows you can balance speed and data design, a key skill in real projects.
"What if we added more denormalized columns to avoid the join? How would the time complexity change for reads and writes?"