Bird
Raised Fist0
ML Pythonml~20 mins

Gaussian Mixture Models in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
GMM Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
Understanding Gaussian Mixture Model Components

What does each component in a Gaussian Mixture Model (GMM) represent?

AA neural network layer used for feature extraction
BA decision boundary separating classes in the dataset
CA single Gaussian distribution representing a cluster in the data
DA single data point used as a centroid
Attempts:
2 left
💡 Hint

Think about how GMM models data distribution using simpler parts.

Predict Output
intermediate
2:00remaining
Output of GMM Prediction Probabilities

What is the output of the following Python code using sklearn's GaussianMixture?

ML Python
from sklearn.mixture import GaussianMixture
import numpy as np

X = np.array([[0], [1], [2], [3]])
gmm = GaussianMixture(n_components=2, random_state=0)
gmm.fit(X)
probs = gmm.predict_proba([[1.5]])
print(probs)
A[[0.5 0.5]]
B[[0.3 0.7]]
C[[0.7 0.3]]
DRaises a ValueError due to input shape
Attempts:
2 left
💡 Hint

Predict_proba returns the probability of the sample belonging to each component.

Model Choice
advanced
1:30remaining
Choosing Number of Components in GMM

You want to cluster data with unknown groups using GMM. Which method helps select the best number of components?

AAlways use 2 components for simplicity
BPick the number of components equal to the number of data points
CUse the mean squared error between data points and cluster centers
DUse the Bayesian Information Criterion (BIC) to compare models with different components
Attempts:
2 left
💡 Hint

Think about a criterion that balances model fit and complexity.

Hyperparameter
advanced
1:30remaining
Effect of Covariance Type in GMM

What is the effect of setting the covariance_type parameter to 'diag' in a GaussianMixture model?

AEach component has its own diagonal covariance matrix, allowing different variances per feature but no covariance
BAll components share the same full covariance matrix
CCovariance matrices are fixed to identity matrices
DCovariance matrices are scalar multiples of the identity matrix
Attempts:
2 left
💡 Hint

Diagonal covariance means no correlation between features within each component.

Metrics
expert
2:00remaining
Evaluating GMM Clustering Quality

Which metric is most appropriate to evaluate the quality of clusters found by a Gaussian Mixture Model when true labels are unknown?

ASilhouette score measuring how similar an object is to its own cluster compared to other clusters
BAccuracy score comparing predicted labels to true labels
CMean squared error between data points and cluster centers
DLog loss computed from predicted probabilities and true labels
Attempts:
2 left
💡 Hint

Think about a metric that works without knowing true labels.

Practice

(1/5)
1. What is the main idea behind a Gaussian Mixture Model (GMM)?
easy
A. It assumes data is made of several bell-shaped groups mixed together.
B. It uses decision trees to split data into groups.
C. It finds the single best line to fit the data points.
D. It clusters data by measuring distances only.

Solution

  1. Step 1: Understand GMM concept

    GMM assumes data comes from multiple groups, each shaped like a bell curve (Gaussian).
  2. Step 2: Compare with other methods

    Unlike decision trees or distance-only methods, GMM models overlapping groups with probabilities.
  3. Final Answer:

    It assumes data is made of several bell-shaped groups mixed together. -> Option A
  4. Quick Check:

    GMM = mixture of Gaussians [OK]
Hint: Remember GMM = mix of bell curves for groups [OK]
Common Mistakes:
  • Confusing GMM with decision trees
  • Thinking GMM finds one line only
  • Assuming GMM uses only distances
2. Which Python library provides a built-in Gaussian Mixture Model class?
easy
A. matplotlib
B. pandas
C. scikit-learn
D. tensorflow

Solution

  1. Step 1: Identify libraries for ML models

    scikit-learn is a popular library with many ML models including GMM.
  2. Step 2: Check other libraries' purpose

    matplotlib is for plotting, pandas for data handling, tensorflow for deep learning, not GMM specifically.
  3. Final Answer:

    scikit-learn -> Option C
  4. Quick Check:

    GMM in scikit-learn [OK]
Hint: GMM class is in scikit-learn, not plotting or deep learning libs [OK]
Common Mistakes:
  • Choosing matplotlib for modeling
  • Confusing pandas with ML models
  • Picking tensorflow for GMM
3. What will the following Python code output?
from sklearn.mixture import GaussianMixture
import numpy as np
X = np.array([[1], [2], [3], [10], [11], [12]])
gmm = GaussianMixture(n_components=2, random_state=0)
gmm.fit(X)
labels = gmm.predict(X)
print(labels.tolist())
medium
A. [1, 0, 1, 0, 1, 0]
B. [0, 0, 0, 1, 1, 1]
C. [0, 1, 0, 1, 0, 1]
D. [1, 1, 1, 0, 0, 0]

Solution

  1. Step 1: Understand data and model

    Data has two clear groups: near 1-3 and near 10-12. GMM with 2 components fits these groups.
  2. Step 2: Predict labels

    GMM assigns first three points to one group (label 0) and last three to another (label 1).
  3. Final Answer:

    [0, 0, 0, 1, 1, 1] -> Option B
  4. Quick Check:

    Groups split as low and high values [OK]
Hint: GMM labels cluster points close together [OK]
Common Mistakes:
  • Mixing label order (0 vs 1)
  • Assuming alternating labels
  • Ignoring clear group separation
4. Identify the error in this GMM code snippet:
from sklearn.mixture import GaussianMixture
X = [[1, 2], [3, 4], [5, 6]]
gmm = GaussianMixture(n_components=2)
gmm.fit(X)
labels = gmm.predict(X)
print(labels)
medium
A. GaussianMixture requires a random_state parameter.
B. n_components must be 3 or more for this data.
C. fit() method should be called after predict().
D. X should be a NumPy array, not a list of lists.

Solution

  1. Step 1: Check data format for GMM

    GMM expects input as a NumPy array, not a plain Python list.
  2. Step 2: Verify other parameters and method order

    n_components=2 is valid, random_state is optional, fit() must be before predict().
  3. Final Answer:

    X should be a NumPy array, not a list of lists. -> Option D
  4. Quick Check:

    Input data type matters for GMM [OK]
Hint: Use NumPy arrays for GMM input data [OK]
Common Mistakes:
  • Passing lists instead of arrays
  • Wrong order of fit and predict
  • Thinking random_state is mandatory
5. You have a dataset with overlapping groups of different sizes and shapes. Which advantage of Gaussian Mixture Models makes them suitable here?
hard
A. They can model overlapping groups with different shapes using probabilities.
B. They always create groups of equal size.
C. They only work for groups that are perfectly separated.
D. They require groups to be circular and same size.

Solution

  1. Step 1: Understand group overlap and shape

    Real data groups often overlap and differ in shape and size.
  2. Step 2: Match GMM strengths

    GMM uses probabilities to model overlapping groups with different shapes, unlike simpler methods.
  3. Final Answer:

    They can model overlapping groups with different shapes using probabilities. -> Option A
  4. Quick Check:

    GMM handles overlap and shape variation [OK]
Hint: GMM models overlap and shape differences well [OK]
Common Mistakes:
  • Thinking GMM needs equal group sizes
  • Assuming groups must be separate
  • Believing GMM only fits circular groups