eval() for expression evaluation in Pandas - Time & Space Complexity
We want to understand how the time needed to evaluate expressions with pandas eval() changes as the data size grows.
How does the work done by eval() scale when we use it on bigger data?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # example size
df = pd.DataFrame({
'A': range(n),
'B': range(n, 2*n)
})
result = pd.eval('df.A + df.B')
This code creates a DataFrame with two columns and uses pd.eval() to add these columns element-wise.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Element-wise addition of two columns using
eval(). - How many times: Once for each row in the DataFrame, so
ntimes.
As the number of rows n increases, the number of additions grows linearly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 additions |
| 100 | 100 additions |
| 1000 | 1000 additions |
Pattern observation: Doubling the input size roughly doubles the work done.
Time Complexity: O(n)
This means the time to evaluate the expression grows in direct proportion to the number of rows.
[X] Wrong: "Using eval() makes the operation constant time because it's optimized internally."
[OK] Correct: Even though eval() can be faster than normal Python loops, it still processes each row once, so the time grows with data size.
Understanding how eval() scales helps you explain performance choices clearly and shows you know how data size affects computation time.
What if we changed the expression to multiply columns instead of adding? How would the time complexity change?