0
0
MatplotlibHow-ToBeginner ยท 3 min read

How to Add Trendline to Scatter Plot in Matplotlib

To add a trendline to a scatter plot in matplotlib, first plot your scatter points using plt.scatter(). Then calculate the trendline using numpy.polyfit() to fit a line and plot it with plt.plot() over the scatter plot.
๐Ÿ“

Syntax

Here is the basic syntax to add a trendline to a scatter plot:

  • plt.scatter(x, y): Plots the scatter points.
  • coefficients = np.polyfit(x, y, degree): Fits a polynomial (usually degree=1 for a line) to the data.
  • poly = np.poly1d(coefficients): Creates a polynomial function from the coefficients.
  • plt.plot(x, poly(x)): Plots the trendline over the scatter plot.
python
import matplotlib.pyplot as plt
import numpy as np

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 7, 11])

plt.scatter(x, y)  # Scatter plot
coefficients = np.polyfit(x, y, 1)  # Fit line (degree=1)
poly = np.poly1d(coefficients)  # Polynomial function
sorted_x = np.sort(x)
plt.plot(sorted_x, poly(sorted_x), color='red')  # Trendline
plt.show()
๐Ÿ’ป

Example

This example shows how to create a scatter plot of points and add a red trendline that fits the data linearly.

python
import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([2, 4, 5, 4, 5, 7, 8, 9, 10])

# Scatter plot
plt.scatter(x, y, label='Data points')

# Fit a linear trendline
coefficients = np.polyfit(x, y, 1)  # degree 1 for linear
poly = np.poly1d(coefficients)

# Plot trendline
sorted_x = np.sort(x)
plt.plot(sorted_x, poly(sorted_x), color='red', label='Trendline')

# Labels and legend
plt.xlabel('X values')
plt.ylabel('Y values')
plt.title('Scatter Plot with Trendline')
plt.legend()
plt.show()
Output
A scatter plot with blue dots representing data points and a red straight line showing the trendline fitted through the points.
โš ๏ธ

Common Pitfalls

  • Not sorting x before plotting the trendline can cause the line to look jagged. Always plot the trendline with sorted x values.
  • Using np.polyfit with a degree higher than 1 fits a curve, not a straight line.
  • For non-linear trends, a polynomial degree higher than 1 or other fitting methods may be needed.
  • For large datasets, consider smoothing or robust fitting to avoid outlier effects.
python
import matplotlib.pyplot as plt
import numpy as np

x = np.array([5, 1, 3, 2, 4])
y = np.array([5, 2, 4, 3, 6])

# Wrong: plotting trendline without sorting x
plt.scatter(x, y)
coefficients = np.polyfit(x, y, 1)
poly = np.poly1d(coefficients)
plt.plot(x, poly(x), color='red')  # Jagged line
plt.title('Trendline without sorting x')
plt.show()

# Right: sort x and plot trendline
plt.scatter(x, y)
sorted_x = np.sort(x)
plt.plot(sorted_x, poly(sorted_x), color='green')  # Smooth line
plt.title('Trendline with sorted x')
plt.show()
Output
First plot shows a jagged red trendline due to unsorted x values; second plot shows a smooth green trendline after sorting x.
๐Ÿ“Š

Quick Reference

Summary tips for adding trendlines to scatter plots in matplotlib:

  • Use np.polyfit(x, y, 1) for a linear trendline.
  • Convert coefficients to a function with np.poly1d().
  • Plot trendline with sorted x values for smooth lines.
  • Label plots clearly for better understanding.
โœ…

Key Takeaways

Use numpy.polyfit with degree 1 to calculate a linear trendline for scatter data.
Always plot the trendline using sorted x values to avoid jagged lines.
Combine plt.scatter and plt.plot to show data points and trendline together.
Label your plot axes and add a legend for clarity.
For non-linear trends, increase polynomial degree or use other fitting methods.