Why custom functions matter in Pandas - Performance Analysis
When using pandas, custom functions let us do special tasks on data. But how fast these functions run matters a lot.
We want to know how the time to run changes as our data grows bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
def custom_func(x):
return x ** 2 + 1
df = pd.DataFrame({'A': range(1000)})
df['B'] = df['A'].apply(custom_func)
This code creates a DataFrame with 1000 numbers and applies a custom function to each number to make a new column.
- Primary operation: Applying the custom function to each row value.
- How many times: Once for every row in the DataFrame (n times).
Each new row means one more call to the custom function, so the total work grows steadily with data size.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 function calls |
| 100 | 100 function calls |
| 1000 | 1000 function calls |
Pattern observation: The time grows directly in line with the number of rows.
Time Complexity: O(n)
This means the time to run grows in a straight line as the data gets bigger.
[X] Wrong: "Using a custom function inside apply is always slow and complex like nested loops."
[OK] Correct: Actually, if the function runs once per row without extra loops inside, it grows simply with data size, not worse.
Knowing how custom functions affect speed helps you write clear and efficient data code, a skill useful in many real projects.
"What if the custom function itself had a loop inside? How would the time complexity change?"