Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is t-SNE used for in machine learning?
t-SNE is used to reduce high-dimensional data to two or three dimensions to help visualize complex data patterns in a simple, easy-to-understand way.
Click to reveal answer
intermediate
How does t-SNE preserve data structure during visualization?
t-SNE preserves local relationships by keeping similar points close together in the low-dimensional space, making clusters visible while allowing some distortion of global structure.
Click to reveal answer
intermediate
What is the main difference between PCA and t-SNE?
PCA is a linear method that preserves global variance, while t-SNE is a nonlinear method focused on preserving local similarities for better cluster visualization.
Click to reveal answer
advanced
What role does the 'perplexity' parameter play in t-SNE?
Perplexity controls the balance between local and global aspects of the data; it roughly sets how many neighbors each point influences during the mapping.
Click to reveal answer
advanced
Why should t-SNE results be interpreted carefully?
Because t-SNE can distort global distances and is sensitive to parameters, its plots show clusters well but should not be used to infer exact distances or relationships between clusters.
Click to reveal answer
What is the primary goal of t-SNE?
AIncrease the number of features in data
BVisualize high-dimensional data in 2D or 3D
CTrain a classification model
DGenerate synthetic data
✗ Incorrect
t-SNE reduces data dimensions to 2 or 3 to help visualize complex data.
Which aspect does t-SNE focus on preserving?
ALocal similarities between nearby points
BGlobal distances between all points
CExact numeric values of features
DData labels
✗ Incorrect
t-SNE preserves local similarities to show clusters clearly.
What does the 'perplexity' parameter affect in t-SNE?
ANumber of output dimensions
BNumber of iterations
CLearning rate speed
DBalance between local and global data structure
✗ Incorrect
Perplexity controls how many neighbors influence each point, balancing local/global structure.
Why is t-SNE not suitable for very large datasets without adjustments?
AIt is computationally expensive and slow
BIt increases data dimensionality
CIt only works with images
DIt requires labeled data
✗ Incorrect
t-SNE can be slow and costly on large datasets without special techniques.
Which method is a linear alternative to t-SNE?
ARandom Forest
BK-means
CPCA
DNeural Networks
✗ Incorrect
PCA is a linear dimensionality reduction method often compared to t-SNE.
Explain how t-SNE helps visualize complex data and what makes it different from other dimensionality reduction methods.
Think about how t-SNE shows clusters clearly by focusing on neighbors.
You got /4 concepts.
Describe the importance of the 'perplexity' parameter in t-SNE and how changing it might affect the visualization.
Consider how many neighbors each point 'sees' during mapping.
You got /4 concepts.
Practice
(1/5)
1. What is the main purpose of using t-SNE in machine learning?
easy
A. To increase the number of features in the dataset
B. To train a predictive model for classification
C. To visualize high-dimensional data in 2D or 3D to find patterns
D. To clean and preprocess data by removing missing values
Solution
Step 1: Understand t-SNE's function
t-SNE is a tool that reduces many features into 2 or 3 dimensions for easy visualization.
Step 2: Identify its main use
It helps us see groups or clusters in complex data, not to train models or clean data.
Final Answer:
To visualize high-dimensional data in 2D or 3D to find patterns -> Option C
Quick Check:
t-SNE = visualization tool [OK]
Hint: t-SNE = visualize complex data simply [OK]
Common Mistakes:
Thinking t-SNE trains prediction models
Confusing t-SNE with data cleaning methods
Assuming t-SNE increases feature count
2. Which of the following is the correct way to import t-SNE from scikit-learn in Python?
easy
A. from sklearn.manifold import TSNE
B. import tsne from sklearn
C. from sklearn.decomposition import TSNE
D. import TSNE from sklearn.manifold
Solution
Step 1: Recall correct import syntax
scikit-learn's t-SNE is in the manifold module and imported as TSNE.
Step 2: Check each option
from sklearn.manifold import TSNE uses correct Python import syntax and correct module. Others have wrong syntax or module.
Final Answer:
from sklearn.manifold import TSNE -> Option A
Quick Check:
Correct import = from sklearn.manifold import TSNE [OK]
Hint: t-SNE is in sklearn.manifold, import as TSNE [OK]
Common Mistakes:
Using wrong module like sklearn.decomposition
Incorrect import syntax causing errors
Confusing lowercase and uppercase in TSNE
3. What will be the shape of the output from the following code snippet?
from sklearn.manifold import TSNE
import numpy as np
X = np.random.rand(100, 50)
tsne = TSNE(n_components=2, random_state=42)
X_embedded = tsne.fit_transform(X)
print(X_embedded.shape)
medium
A. (50, 2)
B. (2, 100)
C. (100, 50)
D. (100, 2)
Solution
Step 1: Understand input and t-SNE output
Input X has 100 samples and 50 features. t-SNE reduces features to 2 dimensions.
Step 2: Determine output shape
Output shape is (number of samples, n_components) = (100, 2).
Final Answer:
(100, 2) -> Option D
Quick Check:
Output shape = (samples, components) [OK]
Hint: Output shape = (samples, n_components) [OK]
Common Mistakes:
Confusing features with samples in output shape
Swapping rows and columns in shape
Assuming output shape matches input shape
4. You run t-SNE on your dataset but get a ValueError: 'perplexity must be less than n_samples'. What is the likely cause and fix?
medium
A. Input data is not scaled; apply normalization
B. Perplexity is set too high; reduce it below number of samples
C. Random state is not set; set random_state parameter
D. Data contains missing values; remove or fill them
Solution
Step 1: Understand the error message
The error says perplexity must be less than number of samples, so perplexity is too large.
Step 2: Fix by adjusting perplexity
Reduce perplexity parameter to a value smaller than the number of samples in your data.
Final Answer:
Perplexity is set too high; reduce it below number of samples -> Option B
Quick Check:
Perplexity < samples [OK]
Hint: Keep perplexity less than sample count [OK]
Common Mistakes:
Ignoring perplexity limits and increasing it
Trying to fix by scaling data instead
Changing unrelated parameters like random_state
5. You have a dataset with 1000 samples and 100 features. You want to visualize it with t-SNE but also keep track of clusters found by KMeans. Which approach is best?
hard
A. Run KMeans first, then apply t-SNE on original data, color points by cluster
B. Apply t-SNE first, then run KMeans on the 2D t-SNE output
C. Use t-SNE only, no clustering needed for visualization
D. Run KMeans on original data and use PCA instead of t-SNE
Solution
Step 1: Understand the goal
You want to visualize data and show meaningful clusters clearly on the 2D plot.
Step 2: Choose correct order
Running KMeans first on high-dimensional data finds accurate clusters, then t-SNE visualizes them by coloring points by cluster labels.
Step 3: Why not other options?
Clustering on t-SNE output (B) is suboptimal as t-SNE distorts distances and is for visualization only, not modeling.
Final Answer:
Run KMeans first, then apply t-SNE on original data, color points by cluster -> Option A
Quick Check:
Cluster high-dim first, visualize after [OK]
Hint: Cluster original data first, then t-SNE visualize [OK]