0
0
DBMS Theoryknowledge~5 mins

Column-store vs row-store in DBMS Theory - Performance Comparison

Choose your learning style9 modes available
Time Complexity: Column-store vs row-store
O(n)
Understanding Time Complexity

We want to understand how the way data is stored affects the speed of reading and writing data.

How does storing data by columns or by rows change the work done as data grows?

Scenario Under Consideration

Analyze the time complexity of scanning data stored in row-store and column-store formats.

-- Row-store scan
SELECT * FROM table WHERE columnA = 'value';

-- Column-store scan
SELECT columnA FROM table WHERE columnA = 'value';

This code shows simple queries scanning data stored by rows or by columns.

Identify Repeating Operations

Look at how many data units are checked repeatedly.

  • Primary operation: Scanning data units (rows or columns) to find matching values.
  • How many times: Once for each row in row-store; once for each value in the column in column-store.
How Execution Grows With Input

As the number of rows (n) grows, the amount of data scanned grows too.

Input Size (n)Approx. Operations (Row-store)Approx. Operations (Column-store)
10Scan 10 rows fullyScan 10 values in one column
100Scan 100 rows fullyScan 100 values in one column
1000Scan 1000 rows fullyScan 1000 values in one column

Pattern observation: Both grow linearly with the number of rows, but column-store scans less data if only some columns are needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to scan grows directly with the number of rows, but column-store can be faster when fewer columns are accessed.

Common Mistake

[X] Wrong: "Column-store always scans less data and is always faster than row-store."

[OK] Correct: If you need many or all columns, column-store may scan as much or more data than row-store, making it not always faster.

Interview Connect

Understanding how data layout affects scanning helps you explain performance trade-offs clearly, a useful skill in database design and optimization discussions.

Self-Check

What if we changed the query to select all columns instead of one? How would the time complexity change for column-store?