NumPy with Pandas integration - Time & Space Complexity
We want to see how fast operations run when using NumPy arrays inside Pandas.
How does the time to process data change as the data grows bigger?
Analyze the time complexity of the following code snippet.
import numpy as np
import pandas as pd
arr = np.random.rand(1000)
df = pd.DataFrame({'values': arr})
result = df['values'].apply(np.sqrt)
This code creates a NumPy array, puts it in a Pandas DataFrame, and applies the square root function to each value.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Applying the square root function to each element in the DataFrame column.
- How many times: Once for each element, so 1000 times in this example.
As the number of elements grows, the time to apply the function grows roughly the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The operations increase directly with the number of elements.
Time Complexity: O(n)
This means the time to run grows in a straight line as the data size grows.
[X] Wrong: "Using NumPy inside Pandas makes operations run instantly, no matter the size."
[OK] Correct: Even with NumPy's speed, applying a function to each element still takes time proportional to the number of elements.
Understanding how NumPy and Pandas work together helps you explain data processing speed clearly and confidently.
"What if we replaced the apply method with a vectorized NumPy operation? How would the time complexity change?"