Finding duplicates efficiently in SQL - Time & Space Complexity
When we look for duplicates in a database table, we want to know how the work grows as the table gets bigger.
We ask: How much longer does it take to find duplicates if we add more rows?
Analyze the time complexity of the following code snippet.
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
This query finds all values in column_name that appear more than once in table_name.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning all rows to group by the column.
- How many times: Each row is processed once to count duplicates.
As the number of rows grows, the database must check each row once to group and count.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to find duplicates grows in a straight line as the table gets bigger.
[X] Wrong: "Finding duplicates takes much longer than just reading the table once."
[OK] Correct: The database groups rows in one pass, so it doesn't repeatedly scan the table.
Understanding how grouping works helps you explain efficient data checks clearly and confidently.
"What if we added an index on column_name? How would the time complexity change?"