0
0
MlopsHow-ToBeginner · 3 min read

How to Use Silhouette Score in Python with sklearn

Use silhouette_score from sklearn.metrics to measure how well clusters are separated in your data. Pass your data and cluster labels to silhouette_score(X, labels) to get a score between -1 and 1, where higher values mean better clustering.
📐

Syntax

The silhouette_score function has this syntax:

  • silhouette_score(X, labels, metric='euclidean')

Where:

  • X is your data array or matrix.
  • labels are the cluster labels for each data point.
  • metric is the distance metric to use (default is 'euclidean').
python
from sklearn.metrics import silhouette_score

score = silhouette_score(X, labels, metric='euclidean')
💻

Example

This example shows how to cluster data with KMeans and then calculate the silhouette score to check clustering quality.

python
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Create sample data with 3 clusters
X, _ = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0)

# Fit KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=0)
labels = kmeans.fit_predict(X)

# Calculate silhouette score
score = silhouette_score(X, labels)
print(f'Silhouette Score: {score:.3f}')
Output
Silhouette Score: 0.59
⚠️

Common Pitfalls

Common mistakes when using silhouette score include:

  • Passing cluster labels that do not match the data size.
  • Using silhouette score on data that is not clustered (labels all the same).
  • Ignoring the fact that silhouette score works best with 2 or more clusters.
  • Using inappropriate distance metrics for your data type.

Always ensure your labels come from a clustering algorithm and match your data points.

python
from sklearn.metrics import silhouette_score

# Wrong: labels length does not match data
X = [[1, 2], [3, 4], [5, 6]]
labels_wrong = [0, 1]  # Only 2 labels for 3 points

# This will raise an error
# silhouette_score(X, labels_wrong)

# Right: labels length matches data
labels_right = [0, 1, 0]
score = silhouette_score(X, labels_right)
print(f'Correct Silhouette Score: {score:.3f}')
Output
Correct Silhouette Score: 0.707
📊

Quick Reference

Tips for using silhouette score effectively:

  • Score ranges from -1 (bad) to +1 (good).
  • Higher score means clusters are well separated.
  • Use to compare different cluster counts.
  • Works best with numeric data and Euclidean distance.

Key Takeaways

Use silhouette_score(X, labels) from sklearn.metrics to evaluate clustering quality.
Silhouette score values near 1 mean good separation; near -1 mean poor clustering.
Ensure labels array length matches the number of data points in X.
Silhouette score helps choose the best number of clusters by comparing scores.
Use appropriate distance metrics matching your data type for accurate scores.