0
0
Matplotlibdata~15 mins

Marker size variation in Matplotlib - Deep Dive

Choose your learning style9 modes available
Overview - Marker size variation
What is it?
Marker size variation is a way to change the size of points in a scatter plot or other plots using markers. Instead of all points having the same size, each point can have a different size based on data values or other rules. This helps to show more information visually, like how important or large a data point is.
Why it matters
Without marker size variation, plots can only show two dimensions of data clearly, like position on x and y axes. By changing marker sizes, you add a third dimension of information, making it easier to spot patterns or outliers. This makes data visualization richer and more meaningful, helping people understand complex data faster.
Where it fits
Before learning marker size variation, you should know how to create basic plots with matplotlib, especially scatter plots. After this, you can learn about other marker customizations like color variation, shapes, and advanced plotting techniques like interactive plots or 3D plots.
Mental Model
Core Idea
Marker size variation lets you use the size of each point in a plot to represent extra data, adding a new layer of meaning beyond just position.
Think of it like...
It's like drawing dots on a map where bigger dots mean bigger cities and smaller dots mean smaller towns, so you can tell size differences at a glance.
Scatter plot with varied marker sizes:

  y-axis
   ↑
   │       ●       ●●●
   │   ●●       ●●
   │ ●       ●
   └────────────────→ x-axis

Each dot's size shows a different value.
Build-Up - 7 Steps
1
FoundationBasic scatter plot creation
🤔
Concept: Learn how to make a simple scatter plot with matplotlib.
Use matplotlib's scatter function to plot points with default marker size. Example: import matplotlib.pyplot as plt x = [1, 2, 3, 4] y = [10, 20, 25, 30] plt.scatter(x, y) plt.show()
Result
A scatter plot with four points all having the same default size.
Understanding how to plot points is the first step before customizing their appearance.
2
FoundationSetting fixed marker size
🤔
Concept: Control the size of all markers by setting a fixed size value.
Add the 's' parameter to scatter to set marker size. Example: plt.scatter(x, y, s=100) plt.show()
Result
All points appear larger and equally sized on the plot.
Knowing how to set marker size uniformly prepares you to vary sizes later.
3
IntermediateVarying marker size by data
🤔Before reading on: Do you think marker sizes can be set using a list of values matching each point? Commit to your answer.
Concept: Use a list or array to assign different sizes to each marker based on data values.
Pass a list or array to the 's' parameter with one size per point. Example: sizes = [20, 50, 80, 200] plt.scatter(x, y, s=sizes) plt.show()
Result
Points appear with different sizes reflecting the values in the sizes list.
Understanding that marker size can be an array unlocks the ability to represent extra data dimensions visually.
4
IntermediateScaling sizes for better visualization
🤔Before reading on: Should marker sizes be used directly from raw data or scaled? Commit to your answer.
Concept: Raw data values often need scaling to suitable marker sizes for clear visualization.
Multiply or transform data values to fit marker size range. Example: raw_sizes = [1, 5, 10, 20] scaled_sizes = [v * 10 for v in raw_sizes] plt.scatter(x, y, s=scaled_sizes) plt.show()
Result
Markers have sizes proportional to scaled data, making differences visible but not overwhelming.
Knowing to scale sizes prevents markers from being too small or too large, improving plot readability.
5
IntermediateUsing marker size with color for multi-dimensions
🤔Before reading on: Can marker size and color be combined to show two extra data dimensions? Commit to your answer.
Concept: Combine marker size variation with color mapping to represent multiple data features simultaneously.
Use 's' for sizes and 'c' for colors in scatter. Example: sizes = [50, 100, 200, 300] colors = [10, 20, 30, 40] plt.scatter(x, y, s=sizes, c=colors, cmap='viridis') plt.colorbar() plt.show()
Result
Plot shows points varying in size and color, representing two different data aspects.
Combining size and color enriches data storytelling in a single plot.
6
AdvancedHandling marker size legends
🤔Before reading on: Does matplotlib automatically create legends for marker sizes? Commit to your answer.
Concept: Matplotlib does not create size legends automatically; you must create custom legends for marker sizes.
Create a legend by plotting invisible points with example sizes. Example: import matplotlib.lines as mlines for size in [50, 100, 200]: plt.scatter([], [], s=size, label=f'Size {size}') plt.legend(scatterpoints=1, frameon=False, labelspacing=1, title='Marker Size') plt.show()
Result
A legend appears explaining what different marker sizes mean.
Knowing how to add size legends helps viewers interpret marker size meaning correctly.
7
ExpertPerformance impact of large marker arrays
🤔Before reading on: Do you think very large arrays of marker sizes affect plot rendering speed? Commit to your answer.
Concept: Using very large arrays for marker sizes can slow down rendering and increase memory use in matplotlib.
Plotting thousands of points with varied sizes may cause lag. Example: import numpy as np x = np.random.rand(10000) y = np.random.rand(10000) sizes = np.random.rand(10000) * 100 plt.scatter(x, y, s=sizes) plt.show()
Result
Plot renders but may be slow or unresponsive depending on system.
Understanding performance limits guides efficient plotting and when to use sampling or other visualization tools.
Under the Hood
Matplotlib's scatter function uses the 's' parameter to set marker sizes in points squared units. Internally, it converts these sizes to pixel areas on the screen. When given an array, it maps each size to the corresponding point. The rendering engine then draws each marker with the specified size, scaling it visually. This process happens during the drawing phase of the plot.
Why designed this way?
The size parameter uses area (points squared) rather than radius or diameter to better reflect visual perception of size differences. Using arrays allows flexible, data-driven size variation. This design balances simplicity for fixed sizes and flexibility for complex data visualization.
Scatter plot rendering flow:

Input data (x, y, sizes) ──▶ Size mapping (points²) ──▶ Renderer converts sizes to pixels ──▶ Markers drawn on canvas

Each step transforms size info for accurate visual display.
Myth Busters - 4 Common Misconceptions
Quick: Does setting marker size to zero hide the marker completely? Commit yes or no.
Common Belief:Setting marker size to zero will hide the marker from the plot.
Tap to reveal reality
Reality:Setting size to zero still draws a very small marker, but it may be visible depending on renderer; to hide markers, you must exclude them or use alpha=0.
Why it matters:Assuming zero size hides markers can lead to cluttered plots or misinterpretation of data presence.
Quick: Does the 's' parameter in scatter represent radius or area? Commit your answer.
Common Belief:The 's' parameter controls the radius of the marker.
Tap to reveal reality
Reality:The 's' parameter controls the area (points squared) of the marker, not the radius.
Why it matters:Misunderstanding this causes incorrect scaling and misleading visualizations.
Quick: Can marker sizes be negative values? Commit yes or no.
Common Belief:You can use negative values for marker sizes to indicate special points.
Tap to reveal reality
Reality:Negative marker sizes are invalid and cause errors or ignored values.
Why it matters:Using negative sizes causes crashes or silent failures, confusing debugging.
Quick: Does matplotlib automatically create legends for marker sizes? Commit yes or no.
Common Belief:Matplotlib automatically generates legends for marker sizes in scatter plots.
Tap to reveal reality
Reality:Matplotlib does not create size legends automatically; you must create custom legends manually.
Why it matters:Without legends, viewers may misinterpret what marker sizes represent.
Expert Zone
1
Marker size is specified in points squared, so doubling the 's' value does not double the radius but increases area, affecting perception.
2
When combining marker size with transparency (alpha), large markers with low alpha can visually blend, affecting interpretation.
3
Using very large marker sizes can cause markers to overlap heavily, which may hide data points or distort visual patterns.
When NOT to use
Avoid marker size variation when data values have extreme outliers that distort size scaling; instead, consider log scaling or clipping. For very large datasets, use density plots or hexbin plots to avoid performance issues and overplotting.
Production Patterns
Professionals use marker size variation to represent quantities like population, sales volume, or frequency in scatter plots. They often combine size with color and add custom legends for clarity. In dashboards, size variation is used interactively with tooltips to explore data layers.
Connections
Data encoding in visualization
Marker size variation is a form of encoding quantitative data visually, similar to color or shape encoding.
Understanding marker size as a data encoding method helps grasp how visual variables communicate information in charts.
Perception psychology
Marker size variation relies on human perception of area differences, which is nonlinear and can be misleading without proper scaling.
Knowing perception limits guides better size scaling to avoid misinterpretation of data magnitude.
Cartography
Marker size variation in plots is similar to proportional symbol maps in cartography, where symbol size represents data like city population.
Recognizing this connection shows how data visualization principles apply across fields to communicate complex data simply.
Common Pitfalls
#1Using raw data values directly as marker sizes without scaling.
Wrong approach:plt.scatter(x, y, s=[1, 1000, 5000, 10000])
Correct approach:scaled_sizes = [v / 100 for v in [1, 1000, 5000, 10000]] plt.scatter(x, y, s=scaled_sizes)
Root cause:Misunderstanding that marker sizes need to be in a reasonable range for visualization.
#2Expecting matplotlib to create a legend for marker sizes automatically.
Wrong approach:plt.scatter(x, y, s=sizes) plt.legend() # expecting size legend
Correct approach:for size in [50, 100, 200]: plt.scatter([], [], s=size, label=f'Size {size}') plt.legend()
Root cause:Assuming all visual encodings have automatic legends in matplotlib.
#3Setting marker size to zero to hide points.
Wrong approach:plt.scatter(x, y, s=0)
Correct approach:plt.scatter(x, y, alpha=0) # or filter points before plotting
Root cause:Confusing marker size with visibility or transparency.
Key Takeaways
Marker size variation adds a third dimension of data to scatter plots by changing point sizes.
Sizes in matplotlib scatter plots represent area, not radius, so scaling must consider this for accurate perception.
Raw data values usually need scaling before use as marker sizes to keep plots readable and meaningful.
Matplotlib does not automatically create legends for marker sizes; custom legends improve plot clarity.
Large datasets with varied marker sizes can impact performance and readability, so alternative visualization methods may be needed.