apply() function for custom logic in Data Analysis Python - Time & Space Complexity
We want to understand how the time it takes to run the apply() function changes as the data grows.
Specifically, how does applying a custom function to each row or column affect the total work done?
Analyze the time complexity of the following code snippet.
import pandas as pd
def custom_logic(row):
return row['A'] * 2 + row['B']
# Create a DataFrame with columns A and B
# Apply custom_logic to each row
result = df.apply(custom_logic, axis=1)
This code applies a custom function to each row of a DataFrame to create a new series.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The
apply()function runs the custom function once for each row. - How many times: It runs exactly as many times as there are rows in the DataFrame.
As the number of rows grows, the total work grows in the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 function calls |
| 100 | 100 function calls |
| 1000 | 1000 function calls |
Pattern observation: Doubling the rows doubles the number of function calls.
Time Complexity: O(n)
This means the time grows linearly with the number of rows in the DataFrame.
[X] Wrong: "The apply() function runs faster than looping manually because it's built-in."
[OK] Correct: apply() still calls the function once per row, so it does similar work as a manual loop. It's not magically faster in terms of time complexity.
Understanding how apply() scales helps you explain your data processing choices clearly and shows you know what happens behind the scenes.
"What if we changed the custom function to use vectorized operations instead of apply()? How would the time complexity change?"