Adding and removing columns in Data Analysis Python - Time & Space Complexity
When we add or remove columns in a data table, the time it takes depends on how many rows we have.
We want to know how the work grows as the table gets bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Example value for n
data = pd.DataFrame({
'A': range(n),
'B': range(n, 2*n)
})
# Adding a new column
data['C'] = data['A'] + data['B']
# Removing a column
data.drop('B', axis=1, inplace=True)
This code creates a table with n rows, adds a new column by combining two columns, then removes one column.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Adding a column involves going through each row once to compute values.
- How many times: Each row is visited once for the addition, and once for removing the column.
As the number of rows grows, the time to add or remove a column grows roughly the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations to add and remove |
| 100 | About 100 operations to add and remove |
| 1000 | About 1000 operations to add and remove |
Pattern observation: The work grows in a straight line with the number of rows.
Time Complexity: O(n)
This means the time to add or remove a column grows directly with the number of rows.
[X] Wrong: "Adding or removing a column is instant and does not depend on data size."
[OK] Correct: Even though it looks simple, the computer must update every row, so it takes more time as the table grows.
Understanding how data operations scale helps you explain your code choices clearly and shows you know how data size affects performance.
"What if we add multiple columns at once instead of one? How would the time complexity change?"