0
0
Matplotlibdata~5 mins

Alternatives for big data (Datashader, HoloViews) in Matplotlib

Choose your learning style9 modes available
Introduction

When you have a lot of data, normal plotting tools can be slow or unclear. Datashader and HoloViews help you see big data clearly and fast.

You want to visualize millions of points without waiting a long time.
You need to explore large datasets interactively.
You want to create clear images from dense data where points overlap.
You want to combine easy plotting with powerful data handling.
You want to avoid slow or cluttered plots with big data.
Syntax
Matplotlib
import datashader as ds
import datashader.transfer_functions as tf
import pandas as pd
import holoviews as hv
hv.extension('bokeh')

Datashader creates images from big data by aggregating points.

HoloViews works with Datashader to make interactive plots easily.

Examples
This code uses Datashader to plot one million points quickly as an image.
Matplotlib
import datashader as ds
import datashader.transfer_functions as tf
import pandas as pd

# Create sample data
points = pd.DataFrame({'x': range(1000000), 'y': range(1000000)})

# Create canvas
canvas = ds.Canvas(plot_width=400, plot_height=400)

# Aggregate points
agg = canvas.points(points, 'x', 'y')

# Create image
img = tf.shade(agg)

img.to_pil()
This example uses HoloViews with Datashader to plot one million random points interactively.
Matplotlib
import holoviews as hv
import numpy as np
hv.extension('bokeh')

# Create random data
points = hv.Points(np.random.randn(1000000, 2))

# Use datashade to plot big data interactively
plot = hv.operation.datashader.datashade(points)

plot
Sample Program

This program creates one million random points and plots them using Datashader and HoloViews. Datashader creates a fast image, and HoloViews creates an interactive plot.

Matplotlib
import datashader as ds
import datashader.transfer_functions as tf
import pandas as pd
import numpy as np
import holoviews as hv
hv.extension('bokeh')

# Generate 1 million random points
n = 1000000
points = pd.DataFrame({
    'x': np.random.normal(size=n),
    'y': np.random.normal(size=n)
})

# Datashader: create canvas and aggregate points
canvas = ds.Canvas(plot_width=400, plot_height=400)
agg = canvas.points(points, 'x', 'y')

# Create image with shading
img = tf.shade(agg, cmap=['lightblue', 'darkblue'], how='log')

# Show image as PIL (for matplotlib, convert to array)
img_pil = img.to_pil()

# Using HoloViews to plot interactively
hv_points = hv.Points(points)
hv_plot = hv.operation.datashader.datashade(hv_points, cmap=['lightblue', 'darkblue'])

print('Datashader image size:', img_pil.size)
print('HoloViews plot object:', hv_plot)
OutputSuccess
Important Notes

Datashader works by turning many points into pixels, so it handles big data well.

HoloViews makes it easy to add interactivity and combine with Datashader.

These tools are good alternatives when matplotlib is too slow or cluttered with big data.

Summary

Datashader and HoloViews help visualize very large datasets quickly and clearly.

Datashader creates images by aggregating data points into pixels.

HoloViews adds easy interactivity and works well with Datashader.