Overview - Marker size variation

What is it?

Marker size variation is a way to change the size of points in a scatter plot or other plots using markers. Instead of all points having the same size, each point can have a different size based on data values or other rules. This helps to show more information visually, like how important or large a data point is.

Why it matters

Without marker size variation, plots can only show two dimensions of data clearly, like position on x and y axes. By changing marker sizes, you add a third dimension of information, making it easier to spot patterns or outliers. This makes data visualization richer and more meaningful, helping people understand complex data faster.

Where it fits

Before learning marker size variation, you should know how to create basic plots with matplotlib, especially scatter plots. After this, you can learn about other marker customizations like color variation, shapes, and advanced plotting techniques like interactive plots or 3D plots.

Mental Model

Core Idea

Marker size variation lets you use the size of each point in a plot to represent extra data, adding a new layer of meaning beyond just position.

Think of it like...

It's like drawing dots on a map where bigger dots mean bigger cities and smaller dots mean smaller towns, so you can tell size differences at a glance.

Scatter plot with varied marker sizes:

  y-axis
   ↑
   │       ●       ●●●
   │   ●●       ●●
   │ ●       ●
   └────────────────→ x-axis

Each dot's size shows a different value.

Build-Up - 7 Steps

1

FoundationBasic scatter plot creation

Concept: Learn how to make a simple scatter plot with matplotlib.

Use matplotlib's scatter function to plot points with default marker size. Example: import matplotlib.pyplot as plt x = [1, 2, 3, 4] y = [10, 20, 25, 30] plt.scatter(x, y) plt.show()

Result

A scatter plot with four points all having the same default size.

Understanding how to plot points is the first step before customizing their appearance.

2

FoundationSetting fixed marker size

3

IntermediateVarying marker size by data

4

IntermediateScaling sizes for better visualization

5

IntermediateUsing marker size with color for multi-dimensions

6

AdvancedHandling marker size legends

7

ExpertPerformance impact of large marker arrays

Under the Hood

Matplotlib's scatter function uses the 's' parameter to set marker sizes in points squared units. Internally, it converts these sizes to pixel areas on the screen. When given an array, it maps each size to the corresponding point. The rendering engine then draws each marker with the specified size, scaling it visually. This process happens during the drawing phase of the plot.

Why designed this way?

The size parameter uses area (points squared) rather than radius or diameter to better reflect visual perception of size differences. Using arrays allows flexible, data-driven size variation. This design balances simplicity for fixed sizes and flexibility for complex data visualization.

Scatter plot rendering flow:

Input data (x, y, sizes) ──▶ Size mapping (points²) ──▶ Renderer converts sizes to pixels ──▶ Markers drawn on canvas

Each step transforms size info for accurate visual display.

Myth Busters - 4 Common Misconceptions

Quick: Does setting marker size to zero hide the marker completely? Commit yes or no.

Common Belief:Setting marker size to zero will hide the marker from the plot.

Tap to reveal reality

Quick: Does the 's' parameter in scatter represent radius or area? Commit your answer.

Common Belief:The 's' parameter controls the radius of the marker.

Tap to reveal reality

Quick: Can marker sizes be negative values? Commit yes or no.

Common Belief:You can use negative values for marker sizes to indicate special points.

Tap to reveal reality

Quick: Does matplotlib automatically create legends for marker sizes? Commit yes or no.

Common Belief:Matplotlib automatically generates legends for marker sizes in scatter plots.

Tap to reveal reality

Expert Zone

1

Marker size is specified in points squared, so doubling the 's' value does not double the radius but increases area, affecting perception.

2

When combining marker size with transparency (alpha), large markers with low alpha can visually blend, affecting interpretation.

3

Using very large marker sizes can cause markers to overlap heavily, which may hide data points or distort visual patterns.

When NOT to use

Avoid marker size variation when data values have extreme outliers that distort size scaling; instead, consider log scaling or clipping. For very large datasets, use density plots or hexbin plots to avoid performance issues and overplotting.

Production Patterns

Professionals use marker size variation to represent quantities like population, sales volume, or frequency in scatter plots. They often combine size with color and add custom legends for clarity. In dashboards, size variation is used interactively with tooltips to explore data layers.

Connections

Data encoding in visualization

Marker size variation is a form of encoding quantitative data visually, similar to color or shape encoding.

Understanding marker size as a data encoding method helps grasp how visual variables communicate information in charts.

Perception psychology

Marker size variation relies on human perception of area differences, which is nonlinear and can be misleading without proper scaling.

Knowing perception limits guides better size scaling to avoid misinterpretation of data magnitude.

Cartography

Marker size variation in plots is similar to proportional symbol maps in cartography, where symbol size represents data like city population.

Recognizing this connection shows how data visualization principles apply across fields to communicate complex data simply.

Common Pitfalls

#1Using raw data values directly as marker sizes without scaling.

Wrong approach:plt.scatter(x, y, s=[1, 1000, 5000, 10000])

Correct approach:scaled_sizes = [v / 100 for v in [1, 1000, 5000, 10000]] plt.scatter(x, y, s=scaled_sizes)

Root cause:Misunderstanding that marker sizes need to be in a reasonable range for visualization.

#2Expecting matplotlib to create a legend for marker sizes automatically.

Wrong approach:plt.scatter(x, y, s=sizes) plt.legend() # expecting size legend

Correct approach:for size in [50, 100, 200]: plt.scatter([], [], s=size, label=f'Size {size}') plt.legend()

Root cause:Assuming all visual encodings have automatic legends in matplotlib.

#3Setting marker size to zero to hide points.

Wrong approach:plt.scatter(x, y, s=0)

Correct approach:plt.scatter(x, y, alpha=0) # or filter points before plotting

Root cause:Confusing marker size with visibility or transparency.

Key Takeaways

Marker size variation adds a third dimension of data to scatter plots by changing point sizes.

Sizes in matplotlib scatter plots represent area, not radius, so scaling must consider this for accurate perception.

Raw data values usually need scaling before use as marker sizes to keep plots readable and meaningful.

Matplotlib does not automatically create legends for marker sizes; custom legends improve plot clarity.

Large datasets with varied marker sizes can impact performance and readability, so alternative visualization methods may be needed.