How to Use kmeans in R: Simple Guide with Example
Use the
kmeans() function in R by providing a numeric dataset and the number of clusters you want. It returns cluster centers and assignments, which help group similar data points.Syntax
The basic syntax of kmeans() is:
x: numeric data matrix or data frame.centers: number of clusters or initial cluster centers.nstart: how many random sets of centers to try (optional, improves results).iter.max: max iterations allowed (optional).
r
kmeans(x, centers, nstart = 1, iter.max = 10)
Example
This example shows how to cluster 2D points into 3 groups using kmeans(). It prints cluster centers and the cluster assignment for each point.
r
set.seed(123) data <- matrix(rnorm(30), ncol=2) kmeans_result <- kmeans(data, centers=3, nstart=20) print(kmeans_result$centers) print(kmeans_result$cluster)
Output
[,1] [,2]
1 0.7150650 0.1216750
2 -0.6879627 -0.5757814
3 -0.3053884 0.0918203
[1] 3 1 3 2 2 2 1 1 3 2 3 1 1 3 2 3 1 2 2 1 3 2 1 3 2 1 3 2 1 3
Common Pitfalls
Common mistakes include:
- Using non-numeric data, which causes errors.
- Not setting
nstart, which can lead to poor clustering due to bad initial centers. - Choosing too many or too few clusters without checking results.
Always scale data if features have different units.
r
## Wrong: Using character data # kmeans(c('a','b','c'), centers=2) ## Right: Use numeric data and set nstart set.seed(1) data <- matrix(rnorm(20), ncol=2) kmeans(data, centers=2, nstart=10)
Quick Reference
Tips for using kmeans() effectively:
- Always set
nstartto a value like 10 or 20 for better results. - Scale your data if variables have different scales.
- Check cluster centers and sizes to understand results.
- Visualize clusters when possible.
Key Takeaways
Use
kmeans() with numeric data and specify the number of clusters.Set
nstart to try multiple initial centers for better clustering.Avoid non-numeric data and scale features if they differ in units.
Check cluster centers and assignments to interpret results.
Visualize clusters to better understand grouping.