Why engineered features improve analysis in Data Analysis Python - Performance Analysis
We want to see how adding engineered features affects the time it takes to analyze data.
How does the work grow when we add more features to our data?
Analyze the time complexity of the following code snippet.
import pandas as pd
def add_features(df):
df['feature_sum'] = df['A'] + df['B']
df['feature_ratio'] = df['A'] / (df['B'] + 1)
return df
# Example usage:
# df = pd.DataFrame({'A': range(n), 'B': range(n)})
# df = add_features(df)
This code adds two new features by combining existing columns in the data.
- Primary operation: Adding and dividing values for each row in the data.
- How many times: Once for every row in the dataset.
As the number of rows grows, the work grows in a similar way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 operations (2 per row) |
| 100 | About 200 operations |
| 1000 | About 2000 operations |
Pattern observation: The work grows directly with the number of rows, doubling rows doubles work.
Time Complexity: O(n)
This means the time to add features grows in a straight line with the number of rows.
[X] Wrong: "Adding more features always makes analysis much slower in a complex way."
[OK] Correct: Adding simple features usually means doing a fixed number of operations per row, so time grows steadily, not wildly.
Understanding how feature creation affects time helps you explain your data preparation steps clearly and confidently.
"What if we added features that compare every row to every other row? How would the time complexity change?"