Alternatives for big data (Datashader, HoloViews) in Matplotlib - Time & Space Complexity
When working with very large data, plotting can slow down a lot. We want to understand how the time to create plots grows as data size grows.
How do tools like Datashader and HoloViews help with this?
Analyze the time complexity of this simple matplotlib plotting code.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(1000000)
y = np.random.rand(1000000)
plt.scatter(x, y, s=1)
plt.show()
This code plots one million points using matplotlib's scatter plot.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Drawing each point on the plot.
- How many times: Once for each of the 1,000,000 points.
As the number of points increases, the time to draw grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 drawing operations |
| 100 | 100 drawing operations |
| 1000 | 1000 drawing operations |
Pattern observation: Doubling the points roughly doubles the work.
Time Complexity: O(n)
This means the time to plot grows linearly with the number of points.
[X] Wrong: "Plotting a million points is always fast enough with matplotlib."
[OK] Correct: Matplotlib draws each point individually, so plotting millions of points can be very slow and use lots of memory.
Understanding how plotting time grows helps you choose the right tools for big data. This skill shows you can think about performance, not just code.
What if we used Datashader to aggregate points before plotting? How would the time complexity change?