When you have a lot of data, normal plotting tools can be slow or unclear. Datashader and HoloViews help you see big data clearly and fast.
0
0
Alternatives for big data (Datashader, HoloViews) in Matplotlib
Introduction
You want to visualize millions of points without waiting a long time.
You need to explore large datasets interactively.
You want to create clear images from dense data where points overlap.
You want to combine easy plotting with powerful data handling.
You want to avoid slow or cluttered plots with big data.
Syntax
Matplotlib
import datashader as ds import datashader.transfer_functions as tf import pandas as pd import holoviews as hv hv.extension('bokeh')
Datashader creates images from big data by aggregating points.
HoloViews works with Datashader to make interactive plots easily.
Examples
This code uses Datashader to plot one million points quickly as an image.
Matplotlib
import datashader as ds import datashader.transfer_functions as tf import pandas as pd # Create sample data points = pd.DataFrame({'x': range(1000000), 'y': range(1000000)}) # Create canvas canvas = ds.Canvas(plot_width=400, plot_height=400) # Aggregate points agg = canvas.points(points, 'x', 'y') # Create image img = tf.shade(agg) img.to_pil()
This example uses HoloViews with Datashader to plot one million random points interactively.
Matplotlib
import holoviews as hv import numpy as np hv.extension('bokeh') # Create random data points = hv.Points(np.random.randn(1000000, 2)) # Use datashade to plot big data interactively plot = hv.operation.datashader.datashade(points) plot
Sample Program
This program creates one million random points and plots them using Datashader and HoloViews. Datashader creates a fast image, and HoloViews creates an interactive plot.
Matplotlib
import datashader as ds import datashader.transfer_functions as tf import pandas as pd import numpy as np import holoviews as hv hv.extension('bokeh') # Generate 1 million random points n = 1000000 points = pd.DataFrame({ 'x': np.random.normal(size=n), 'y': np.random.normal(size=n) }) # Datashader: create canvas and aggregate points canvas = ds.Canvas(plot_width=400, plot_height=400) agg = canvas.points(points, 'x', 'y') # Create image with shading img = tf.shade(agg, cmap=['lightblue', 'darkblue'], how='log') # Show image as PIL (for matplotlib, convert to array) img_pil = img.to_pil() # Using HoloViews to plot interactively hv_points = hv.Points(points) hv_plot = hv.operation.datashader.datashade(hv_points, cmap=['lightblue', 'darkblue']) print('Datashader image size:', img_pil.size) print('HoloViews plot object:', hv_plot)
OutputSuccess
Important Notes
Datashader works by turning many points into pixels, so it handles big data well.
HoloViews makes it easy to add interactivity and combine with Datashader.
These tools are good alternatives when matplotlib is too slow or cluttered with big data.
Summary
Datashader and HoloViews help visualize very large datasets quickly and clearly.
Datashader creates images by aggregating data points into pixels.
HoloViews adds easy interactivity and works well with Datashader.