R-programmingHow-ToBeginner · 3 min read

How to Use kmeans in R: Simple Guide with Example

Use the kmeans() function in R by providing a numeric dataset and the number of clusters you want. It returns cluster centers and assignments, which help group similar data points.

📐

Syntax

The basic syntax of kmeans() is:

x: numeric data matrix or data frame.
centers: number of clusters or initial cluster centers.
nstart: how many random sets of centers to try (optional, improves results).
iter.max: max iterations allowed (optional).

kmeans(x, centers, nstart = 1, iter.max = 10)

💻

Example

This example shows how to cluster 2D points into 3 groups using kmeans(). It prints cluster centers and the cluster assignment for each point.

set.seed(123)
data <- matrix(rnorm(30), ncol=2)
kmeans_result <- kmeans(data, centers=3, nstart=20)
print(kmeans_result$centers)
print(kmeans_result$cluster)

Output

[,1] [,2] 1 0.7150650 0.1216750 2 -0.6879627 -0.5757814 3 -0.3053884 0.0918203 [1] 3 1 3 2 2 2 1 1 3 2 3 1 1 3 2 3 1 2 2 1 3 2 1 3 2 1 3 2 1 3

⚠️

Common Pitfalls

Common mistakes include:

Using non-numeric data, which causes errors.
Not setting nstart, which can lead to poor clustering due to bad initial centers.
Choosing too many or too few clusters without checking results.

Always scale data if features have different units.

## Wrong: Using character data
# kmeans(c('a','b','c'), centers=2)

## Right: Use numeric data and set nstart
set.seed(1)
data <- matrix(rnorm(20), ncol=2)
kmeans(data, centers=2, nstart=10)

📊

Quick Reference

Tips for using kmeans() effectively:

Always set nstart to a value like 10 or 20 for better results.
Scale your data if variables have different scales.
Check cluster centers and sizes to understand results.
Visualize clusters when possible.

✅

Key Takeaways

Use kmeans() with numeric data and specify the number of clusters.

Set nstart to try multiple initial centers for better clustering.

Avoid non-numeric data and scale features if they differ in units.

Check cluster centers and assignments to interpret results.

Visualize clusters to better understand grouping.