Column-store vs row-store in DBMS Theory - Performance Comparison
We want to understand how the way data is stored affects the speed of reading and writing data.
How does storing data by columns or by rows change the work done as data grows?
Analyze the time complexity of scanning data stored in row-store and column-store formats.
-- Row-store scan
SELECT * FROM table WHERE columnA = 'value';
-- Column-store scan
SELECT columnA FROM table WHERE columnA = 'value';
This code shows simple queries scanning data stored by rows or by columns.
Look at how many data units are checked repeatedly.
- Primary operation: Scanning data units (rows or columns) to find matching values.
- How many times: Once for each row in row-store; once for each value in the column in column-store.
As the number of rows (n) grows, the amount of data scanned grows too.
| Input Size (n) | Approx. Operations (Row-store) | Approx. Operations (Column-store) |
|---|---|---|
| 10 | Scan 10 rows fully | Scan 10 values in one column |
| 100 | Scan 100 rows fully | Scan 100 values in one column |
| 1000 | Scan 1000 rows fully | Scan 1000 values in one column |
Pattern observation: Both grow linearly with the number of rows, but column-store scans less data if only some columns are needed.
Time Complexity: O(n)
This means the time to scan grows directly with the number of rows, but column-store can be faster when fewer columns are accessed.
[X] Wrong: "Column-store always scans less data and is always faster than row-store."
[OK] Correct: If you need many or all columns, column-store may scan as much or more data than row-store, making it not always faster.
Understanding how data layout affects scanning helps you explain performance trade-offs clearly, a useful skill in database design and optimization discussions.
What if we changed the query to select all columns instead of one? How would the time complexity change for column-store?