Denormalization tradeoffs in DBMS Theory - Time & Space Complexity
Denormalization changes how data is stored to make some queries faster.
We want to understand how this affects the time it takes to run database operations.
Analyze the time complexity of a query on a denormalized table.
-- Example: A denormalized table with repeated data
SELECT customer_name, order_id, product_name, product_price
FROM denormalized_orders
WHERE customer_id = 123;
This query fetches order details including product info directly from one table without joins.
Look for repeated work done by the database engine.
- Primary operation: Scanning rows matching the customer_id filter.
- How many times: Once per matching row in the denormalized table.
As the number of orders grows, the query scans more rows for that customer.
| Input Size (number of orders) | Approx. Rows Scanned |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The work grows directly with the number of matching rows.
Time Complexity: O(n)
This means the query time grows linearly with the number of rows it needs to read.
[X] Wrong: "Denormalization always makes queries faster with no downsides."
[OK] Correct: Denormalization can speed up reads but may slow down updates and increase storage, affecting overall performance.
Understanding denormalization tradeoffs shows you can balance speed and data consistency, a key skill in database design.
What if we added indexes on customer_id in the denormalized table? How would the time complexity change?