Division operation in DBMS Theory - Time & Space Complexity
When working with division operations in a database, it's important to understand how the time taken grows as the data size increases.
We want to know how the cost of performing division scales with input size.
Analyze the time complexity of the following SQL query using division.
SELECT A.x
FROM A
WHERE NOT EXISTS (
SELECT B.y
FROM B
WHERE NOT EXISTS (
SELECT *
FROM R
WHERE R.x = A.x AND R.y = B.y
)
);
This query finds all values in table A that relate to all values in table B through table R, using a double NOT EXISTS pattern to perform division.
Look at the nested queries and repeated checks.
- Primary operation: For each row in A, check all rows in B, and for each pair, check existence in R.
- How many times: The inner check runs for every combination of A and B rows.
As the number of rows in A and B grows, the number of checks grows quickly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 100 checks |
| 100 | About 10,000 checks |
| 1000 | About 1,000,000 checks |
Pattern observation: The number of operations grows roughly with the product of the sizes of A and B.
Time Complexity: O(n * m)
This means the time grows proportionally to the number of rows in A times the number of rows in B.
[X] Wrong: "Division queries run in constant time regardless of input size."
[OK] Correct: Because the query checks combinations of rows from two tables, the time grows with their sizes, not fixed.
Understanding how division queries scale helps you explain query performance and optimization in real database tasks.
What if we added an index on the columns used in the WHERE clause? How would the time complexity change?