Line plots with plot() in Pandas - Time & Space Complexity
When we create line plots using pandas' plot(), the computer draws points and lines for each data value.
We want to know how the time to draw grows as we add more data points.
Analyze the time complexity of the following code snippet.
import pandas as pd
import numpy as np
data = pd.DataFrame({
'x': np.arange(1000),
'y': np.random.randn(1000)
})
data.plot(x='x', y='y', kind='line')
This code creates a line plot of 1000 points from the DataFrame columns.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Drawing each point and connecting line segment on the plot.
- How many times: Once for each data point (n times).
As the number of points increases, the time to draw grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 points drawn |
| 100 | About 100 points drawn |
| 1000 | About 1000 points drawn |
Pattern observation: Doubling the data roughly doubles the work to draw the plot.
Time Complexity: O(n)
This means the time to create the line plot grows linearly with the number of data points.
[X] Wrong: "Plotting a line is instant no matter how many points there are."
[OK] Correct: Each point and line segment must be drawn, so more points mean more work and more time.
Understanding how plotting time grows helps you explain performance when working with large datasets and visualizations.
What if we changed the plot to show multiple lines instead of one? How would the time complexity change?