0
0
R-programmingHow-ToBeginner · 3 min read

How to Use kmeans in R: Simple Guide with Example

Use the kmeans() function in R by providing a numeric dataset and the number of clusters you want. It returns cluster centers and assignments, which help group similar data points.
📐

Syntax

The basic syntax of kmeans() is:

  • x: numeric data matrix or data frame.
  • centers: number of clusters or initial cluster centers.
  • nstart: how many random sets of centers to try (optional, improves results).
  • iter.max: max iterations allowed (optional).
r
kmeans(x, centers, nstart = 1, iter.max = 10)
💻

Example

This example shows how to cluster 2D points into 3 groups using kmeans(). It prints cluster centers and the cluster assignment for each point.

r
set.seed(123)
data <- matrix(rnorm(30), ncol=2)
kmeans_result <- kmeans(data, centers=3, nstart=20)
print(kmeans_result$centers)
print(kmeans_result$cluster)
Output
[,1] [,2] 1 0.7150650 0.1216750 2 -0.6879627 -0.5757814 3 -0.3053884 0.0918203 [1] 3 1 3 2 2 2 1 1 3 2 3 1 1 3 2 3 1 2 2 1 3 2 1 3 2 1 3 2 1 3
⚠️

Common Pitfalls

Common mistakes include:

  • Using non-numeric data, which causes errors.
  • Not setting nstart, which can lead to poor clustering due to bad initial centers.
  • Choosing too many or too few clusters without checking results.

Always scale data if features have different units.

r
## Wrong: Using character data
# kmeans(c('a','b','c'), centers=2)

## Right: Use numeric data and set nstart
set.seed(1)
data <- matrix(rnorm(20), ncol=2)
kmeans(data, centers=2, nstart=10)
📊

Quick Reference

Tips for using kmeans() effectively:

  • Always set nstart to a value like 10 or 20 for better results.
  • Scale your data if variables have different scales.
  • Check cluster centers and sizes to understand results.
  • Visualize clusters when possible.

Key Takeaways

Use kmeans() with numeric data and specify the number of clusters.
Set nstart to try multiple initial centers for better clustering.
Avoid non-numeric data and scale features if they differ in units.
Check cluster centers and assignments to interpret results.
Visualize clusters to better understand grouping.