How to Add Trendline to Scatter Plot in Matplotlib
To add a trendline to a scatter plot in
matplotlib, first plot your scatter points using plt.scatter(). Then calculate the trendline using numpy.polyfit() to fit a line and plot it with plt.plot() over the scatter plot.Syntax
Here is the basic syntax to add a trendline to a scatter plot:
plt.scatter(x, y): Plots the scatter points.coefficients = np.polyfit(x, y, degree): Fits a polynomial (usually degree=1 for a line) to the data.poly = np.poly1d(coefficients): Creates a polynomial function from the coefficients.plt.plot(x, poly(x)): Plots the trendline over the scatter plot.
python
import matplotlib.pyplot as plt import numpy as np x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 5, 7, 11]) plt.scatter(x, y) # Scatter plot coefficients = np.polyfit(x, y, 1) # Fit line (degree=1) poly = np.poly1d(coefficients) # Polynomial function sorted_x = np.sort(x) plt.plot(sorted_x, poly(sorted_x), color='red') # Trendline plt.show()
Example
This example shows how to create a scatter plot of points and add a red trendline that fits the data linearly.
python
import matplotlib.pyplot as plt import numpy as np # Sample data x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]) y = np.array([2, 4, 5, 4, 5, 7, 8, 9, 10]) # Scatter plot plt.scatter(x, y, label='Data points') # Fit a linear trendline coefficients = np.polyfit(x, y, 1) # degree 1 for linear poly = np.poly1d(coefficients) # Plot trendline sorted_x = np.sort(x) plt.plot(sorted_x, poly(sorted_x), color='red', label='Trendline') # Labels and legend plt.xlabel('X values') plt.ylabel('Y values') plt.title('Scatter Plot with Trendline') plt.legend() plt.show()
Output
A scatter plot with blue dots representing data points and a red straight line showing the trendline fitted through the points.
Common Pitfalls
- Not sorting
xbefore plotting the trendline can cause the line to look jagged. Always plot the trendline with sortedxvalues. - Using
np.polyfitwith a degree higher than 1 fits a curve, not a straight line. - For non-linear trends, a polynomial degree higher than 1 or other fitting methods may be needed.
- For large datasets, consider smoothing or robust fitting to avoid outlier effects.
python
import matplotlib.pyplot as plt import numpy as np x = np.array([5, 1, 3, 2, 4]) y = np.array([5, 2, 4, 3, 6]) # Wrong: plotting trendline without sorting x plt.scatter(x, y) coefficients = np.polyfit(x, y, 1) poly = np.poly1d(coefficients) plt.plot(x, poly(x), color='red') # Jagged line plt.title('Trendline without sorting x') plt.show() # Right: sort x and plot trendline plt.scatter(x, y) sorted_x = np.sort(x) plt.plot(sorted_x, poly(sorted_x), color='green') # Smooth line plt.title('Trendline with sorted x') plt.show()
Output
First plot shows a jagged red trendline due to unsorted x values; second plot shows a smooth green trendline after sorting x.
Quick Reference
Summary tips for adding trendlines to scatter plots in matplotlib:
- Use
np.polyfit(x, y, 1)for a linear trendline. - Convert coefficients to a function with
np.poly1d(). - Plot trendline with sorted
xvalues for smooth lines. - Label plots clearly for better understanding.
Key Takeaways
Use numpy.polyfit with degree 1 to calculate a linear trendline for scatter data.
Always plot the trendline using sorted x values to avoid jagged lines.
Combine plt.scatter and plt.plot to show data points and trendline together.
Label your plot axes and add a legend for clarity.
For non-linear trends, increase polynomial degree or use other fitting methods.