Vectorized operations vs loops in Pandas - Performance Comparison
We want to see how fast pandas code runs when using vectorized operations compared to loops.
How does the time needed change as the data size grows?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Define n before using it
df = pd.DataFrame({"value": range(n)})
# Vectorized operation
result_vec = df["value"] * 2
# Loop operation
result_loop = []
for v in df["value"]:
result_loop.append(v * 2)
This code creates a DataFrame with n rows and doubles the values using two methods: vectorized and loop.
- Primary operation: Multiplying each value by 2.
- How many times: Once per element, n times.
- Vectorized method does this internally in optimized C code without explicit Python loops.
- Loop method explicitly repeats the multiply operation n times in Python.
As n grows, the number of multiply operations grows linearly.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 multiplications |
| 100 | 100 multiplications |
| 1000 | 1000 multiplications |
Pattern observation: The work doubles when the input doubles, growing in a straight line.
Time Complexity: O(n)
This means the time needed grows directly in proportion to the number of rows.
[X] Wrong: "Vectorized operations always run in constant time regardless of data size."
[OK] Correct: Vectorized operations still process each element, so time grows with data size, but they run faster because they avoid slow Python loops.
Understanding how vectorized operations scale helps you write faster pandas code and shows you know how to handle data efficiently in real projects.
"What if we replaced the loop with a list comprehension? How would the time complexity change?"