Selecting columns in Data Analysis Python - Time & Space Complexity
When we select columns from a dataset, we want to know how the time to do this changes as the dataset grows.
We ask: How does the work increase when the number of rows or columns grows?
Analyze the time complexity of the following code snippet.
import pandas as pd
def select_columns(df, cols):
return df[cols]
# Example usage:
# df is a DataFrame with many rows and columns
# cols is a list of column names to select
This code returns a new DataFrame with only the columns listed in cols.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Processing each selected column.
- How many times: Once for each column in
cols.
As the number of selected columns grows, the work grows linearly because each selected column must be incorporated into the new DataFrame.
| Input Size (columns) | Approx. Operations |
|---|---|
| 10 | 10 times the work per column |
| 100 | 100 times the work per column |
| 1000 | 1000 times the work per column |
Pattern observation: The work grows directly with the number of columns.
Time Complexity: O(m)
This means the time to select columns grows in a straight line as the number of columns increases.
[X] Wrong: "Selecting columns is instant and does not depend on the number of columns."
[OK] Correct: Even though we only pick columns, the system still processes each selected column to build the new DataFrame, so time grows with columns.
Understanding how data selection scales helps you explain your code choices clearly and shows you know what happens behind the scenes.
"What if we select only one column instead of many? How would the time complexity change?"