0
0
Matplotlibdata~5 mins

Categorical scatter with jitter in Matplotlib

Choose your learning style9 modes available
Introduction

We use categorical scatter plots with jitter to show data points for categories clearly. Jitter adds small random shifts so points don't overlap.

When you want to see individual data points for categories like fruits or colors.
When many points overlap in a category and you want to spread them out to see density.
When comparing groups and you want to show all data points, not just summaries.
When you want a simple way to visualize distribution within categories.
Syntax
Matplotlib
import matplotlib.pyplot as plt
import numpy as np

categories = ['A', 'B', 'C']
values = [5, 7, 6, 8, 5, 7, 6, 9, 5]
category_labels = ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C']

# Convert categories to numbers
x = [categories.index(cat) for cat in category_labels]

# Add jitter
jitter = np.random.uniform(-0.1, 0.1, size=len(x))
x_jittered = x + jitter

plt.scatter(x_jittered, values)
plt.xticks(range(len(categories)), categories)
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Categorical scatter with jitter')
plt.show()

Jitter is a small random number added to category positions to avoid overlap.

Categories are converted to numbers because scatter needs numeric x values.

Examples
This example shows two categories with jitter to spread points horizontally.
Matplotlib
import matplotlib.pyplot as plt
import numpy as np

categories = ['X', 'Y']
values = [1, 2, 3, 4, 5, 6]
category_labels = ['X', 'X', 'Y', 'Y', 'Y', 'X']

x = [categories.index(cat) for cat in category_labels]
jitter = np.random.uniform(-0.05, 0.05, len(x))
x_jittered = x + jitter

plt.scatter(x_jittered, values)
plt.xticks(range(len(categories)), categories)
plt.show()
Here jitter is larger to spread points more for better visibility.
Matplotlib
import matplotlib.pyplot as plt
import numpy as np

cats = ['Dog', 'Cat', 'Bird']
vals = [3, 5, 2, 4, 6, 7, 3, 5]
labels = ['Dog', 'Dog', 'Cat', 'Cat', 'Bird', 'Bird', 'Bird', 'Dog']

x = [cats.index(c) for c in labels]
jitter = np.random.uniform(-0.2, 0.2, len(x))
x_jittered = x + jitter

plt.scatter(x_jittered, vals, color='green')
plt.xticks(range(len(cats)), cats)
plt.title('Pets values with jitter')
plt.show()
Sample Program

This program shows how to plot values for color categories with jitter to avoid overlapping points. The jitter is random but fixed by seed for consistent results.

Matplotlib
import matplotlib.pyplot as plt
import numpy as np

# Define categories and values
categories = ['Red', 'Blue', 'Green']
values = [10, 15, 10, 20, 25, 15, 10, 30, 20]
category_labels = ['Red', 'Red', 'Blue', 'Blue', 'Blue', 'Green', 'Green', 'Green', 'Green']

# Convert categories to numeric positions
x = [categories.index(cat) for cat in category_labels]

# Add jitter to x positions
np.random.seed(42)  # For reproducible jitter
jitter = np.random.uniform(-0.15, 0.15, size=len(x))
x_jittered = x + jitter

# Create scatter plot
plt.scatter(x_jittered, values, color='purple')
plt.xticks(range(len(categories)), categories)
plt.xlabel('Color')
plt.ylabel('Value')
plt.title('Categorical scatter plot with jitter')
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()
OutputSuccess
Important Notes

Use a small jitter range to keep points near their category.

Setting a random seed helps get the same jitter every time you run the code.

Jitter only affects horizontal position for categorical scatter plots.

Summary

Categorical scatter plots show individual points for categories.

Jitter adds small random shifts to avoid overlapping points.

This helps visualize data distribution within categories clearly.