Filtering rows in R Programming - Time & Space Complexity
When we filter rows in a data table, we want to know how the time to do this changes as the table gets bigger.
We ask: How does the work grow when there are more rows to check?
Analyze the time complexity of the following code snippet.
# Filter rows where value is greater than 10
filtered_data <- data[data$value > 10, ]
This code checks each row in the data frame and keeps only those with a value greater than 10.
- Primary operation: Checking each row's value to see if it is greater than 10.
- How many times: Once for every row in the data frame.
As the number of rows grows, the number of checks grows the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The work grows directly with the number of rows; doubling rows doubles the work.
Time Complexity: O(n)
This means the time to filter grows in a straight line with the number of rows.
[X] Wrong: "Filtering only a few rows means the operation is very fast no matter the data size."
[OK] Correct: Even if few rows match, the code still checks every row once, so time depends on total rows.
Understanding how filtering scales helps you write efficient data code and explain your choices clearly.
"What if we used a pre-sorted data frame and stopped checking once values were too small? How would the time complexity change?"