0
0
ML Pythonml~12 mins

UMAP for dimensionality reduction in ML Python - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - UMAP for dimensionality reduction

This pipeline uses UMAP to reduce the number of features in data while keeping its important structure. It helps us see and understand complex data by turning many features into just two or three.

Data Flow - 3 Stages
1Input Data
1000 rows x 50 columnsRaw data with 50 features per sample1000 rows x 50 columns
[[0.5, 1.2, ..., 0.3], [1.1, 0.7, ..., 0.9], ...]
2Preprocessing
1000 rows x 50 columnsStandardize features to zero mean and unit variance1000 rows x 50 columns
[[-0.3, 0.8, ..., -1.1], [1.2, -0.5, ..., 0.4], ...]
3UMAP Dimensionality Reduction
1000 rows x 50 columnsReduce features from 50 to 2 using UMAP1000 rows x 2 columns
[[1.5, -0.7], [0.3, 1.2], ...]
Training Trace - Epoch by Epoch
Loss
1.0 | *       
0.8 | **      
0.6 | ***     
0.4 | ****    
0.2 | *****   
    +---------
     1 2 3 4 5
     Epochs
EpochLoss ↓Accuracy ↑Observation
10.85N/AInitial embedding with high loss, structure not clear
20.60N/ALoss decreased, clusters start to form
30.45N/ABetter separation of groups visible
40.35N/AEmbedding stabilizes, loss decreases slower
50.30N/AFinal embedding with clear cluster structure
Prediction Trace - 3 Layers
Layer 1: Input Sample
Layer 2: Standardization
Layer 3: UMAP Projection
Model Quiz - 3 Questions
Test your understanding
What does UMAP do in this pipeline?
AIncreases the number of features
BRemoves rows from the dataset
CReduces data from many features to fewer features
DChanges data labels
Key Insight
UMAP helps us see complex data by turning many features into just a few while keeping the important patterns. Standardizing data first helps UMAP work better. The training loss going down shows the model finds a clearer view of the data step by step.