Multi-version concurrency control (MVCC) in DBMS Theory - Time & Space Complexity
We want to understand how the time cost of managing multiple versions of data grows as more transactions happen.
How does MVCC handle many reads and writes efficiently as data changes?
Analyze the time complexity of the following MVCC operations.
-- When a transaction reads data:
SELECT * FROM table WHERE id = X AND version <= current_version ORDER BY version DESC LIMIT 1;
-- When a transaction writes data:
INSERT INTO table (id, data, version) VALUES (X, new_data, new_version);
-- Cleanup old versions:
DELETE FROM table WHERE version < oldest_active_version;
This snippet shows how MVCC reads the latest visible version, writes a new version, and cleans old versions.
Look at what repeats as data and transactions grow.
- Primary operation: Searching for the correct version of a row during reads.
- How many times: Once per read, but depends on how many versions exist for that row.
As more versions accumulate, finding the right version can take longer if not indexed well.
| Input Size (versions per row) | Approx. Operations |
|---|---|
| 10 | About 10 checks to find the right version |
| 100 | About 100 checks if no index |
| 1000 | About 1000 checks if no index |
Pattern observation: Without indexing, the search grows linearly with the number of versions per row.
Time Complexity: O(n)
This means the time to find the correct version grows linearly with the number of versions stored for a row.
[X] Wrong: "MVCC always gives constant time reads regardless of versions."
[OK] Correct: If many versions exist and no indexing is used, the system may scan many versions, making reads slower.
Understanding how MVCC scales helps you explain database performance and concurrency control clearly, a valuable skill in many roles.
"What if we added an index on the version column? How would the time complexity change?"