0
0
ML Pythonml~8 mins

UMAP for dimensionality reduction in ML Python - Model Metrics & Evaluation

Choose your learning style9 modes available
Metrics & Evaluation - UMAP for dimensionality reduction
Which metric matters for UMAP and WHY

UMAP reduces data to fewer dimensions while keeping its shape. We check how well it keeps neighbors close. Trustworthiness and Continuity are key metrics. Trustworthiness shows if points close in low dimensions were close before. Continuity checks if original close points stay close after. These tell us if UMAP keeps the data's true structure.

Confusion matrix or equivalent visualization

UMAP does not classify, so no confusion matrix. Instead, we use neighbor preservation matrices. For example, a matrix showing how many original neighbors remain neighbors after reduction:

Original neighbors: 5
Neighbors after UMAP: 4
Preserved neighbors: 3
Trustworthiness = 3 / 4 = 0.75
Continuity = 3 / 5 = 0.6
    

This shows how many neighbors UMAP kept correctly.

Precision vs Recall tradeoff (or equivalent)

UMAP balances keeping local and global data shapes. Trustworthiness is like precision: it measures if neighbors in low dimensions are truly neighbors. Continuity is like recall: it checks if original neighbors appear in low dimensions. High trustworthiness but low continuity means UMAP shows only some neighbors well. High continuity but low trustworthiness means it shows many neighbors but some are wrong. We want both high for good reduction.

What "good" vs "bad" metric values look like

Good UMAP: Trustworthiness and continuity above 0.9 means neighbors are well kept. The low-dimensional map shows clear groups like original data.

Bad UMAP: Trustworthiness or continuity below 0.5 means many neighbors are lost or wrongly placed. The map looks mixed or confusing.

Common pitfalls with UMAP metrics
  • Ignoring global structure: UMAP focuses on local neighbors, so global distances may distort.
  • Overfitting neighbors: Too many neighbors in UMAP can force false connections, lowering trustworthiness.
  • Using only visual checks: A pretty plot may hide poor neighbor preservation.
  • Not comparing metrics: Trustworthiness or continuity alone can mislead; use both.
Self-check question

Your UMAP reduction has trustworthiness 0.95 but continuity 0.4. Is it good? Why or why not?

Answer: No, it is not good. High trustworthiness means neighbors shown are mostly correct, but low continuity means many original neighbors are missing. The map misses many true neighbors, so it does not fully keep the data's structure.

Key Result
UMAP quality is best judged by trustworthiness and continuity, measuring how well neighbors are preserved in reduced space.