0
0
ML Pythonml~5 mins

UMAP for dimensionality reduction in ML Python

Choose your learning style9 modes available
Introduction
UMAP helps us shrink big data with many features into fewer features so we can see patterns more easily.
You have data with many measurements and want to see it in 2D or 3D plots.
You want to speed up other machine learning tasks by reducing data size.
You want to find groups or clusters in complex data.
You want to visualize how data points relate to each other in a simpler way.
Syntax
ML Python
import umap

reducer = umap.UMAP(n_neighbors=15, n_components=2, metric='euclidean')
embedding = reducer.fit_transform(data)
n_neighbors controls how many nearby points UMAP looks at to learn structure.
n_components is how many dimensions you want after shrinking (usually 2 or 3).
Examples
Shrink data to 2D using 10 neighbors to find local structure.
ML Python
import umap
reducer = umap.UMAP(n_neighbors=10, n_components=2)
embedding = reducer.fit_transform(data)
Shrink data to 3D using 30 neighbors and Manhattan distance.
ML Python
import umap
reducer = umap.UMAP(n_neighbors=30, n_components=3, metric='manhattan')
embedding = reducer.fit_transform(data)
Sample Model
This code loads the Iris flower data, reduces its 4 features to 2 using UMAP, and prints the new shape and first 5 points.
ML Python
import numpy as np
import umap
from sklearn.datasets import load_iris

# Load sample data
iris = load_iris()
data = iris.data

# Create UMAP reducer
reducer = umap.UMAP(n_neighbors=15, n_components=2, metric='euclidean', random_state=42)

# Fit and transform data
embedding = reducer.fit_transform(data)

# Print shape and first 5 points
print('Embedding shape:', embedding.shape)
print('First 5 points of embedding:')
print(embedding[:5])
OutputSuccess
Important Notes
UMAP works well with continuous data and can handle large datasets efficiently.
Choosing n_neighbors affects how local or global the embedding looks; smaller values focus on local details.
Set random_state for reproducible results.
Summary
UMAP reduces many features into fewer to help visualize and understand data.
It uses neighbors to learn data shape and keeps similar points close.
You can choose how many dimensions to reduce to, usually 2 or 3.