What does each component in a Gaussian Mixture Model (GMM) represent?
Think about how GMM models data distribution using simpler parts.
Each component in a GMM is a Gaussian distribution that models one cluster or group in the data. The overall model is a weighted sum of these Gaussians.
What is the output of the following Python code using sklearn's GaussianMixture?
from sklearn.mixture import GaussianMixture import numpy as np X = np.array([[0], [1], [2], [3]]) gmm = GaussianMixture(n_components=2, random_state=0) gmm.fit(X) probs = gmm.predict_proba([[1.5]]) print(probs)
Predict_proba returns the probability of the sample belonging to each component.
The output shows the probabilities that the input point belongs to each Gaussian component. The exact values depend on the fitted model but here the point is closer to the second component, so higher probability is for component 1 (index 1).
You want to cluster data with unknown groups using GMM. Which method helps select the best number of components?
Think about a criterion that balances model fit and complexity.
BIC helps select the number of components by penalizing model complexity while rewarding better fit. Lower BIC means a better model choice.
What is the effect of setting the covariance_type parameter to 'diag' in a GaussianMixture model?
Diagonal covariance means no correlation between features within each component.
Setting covariance_type='diag' means each Gaussian component has its own covariance matrix but only the diagonal elements are non-zero, modeling variance per feature independently.
Which metric is most appropriate to evaluate the quality of clusters found by a Gaussian Mixture Model when true labels are unknown?
Think about a metric that works without knowing true labels.
Silhouette score measures cluster cohesion and separation without needing true labels, making it suitable for unsupervised clustering evaluation.