ML Pythonml~20 mins

Gaussian Mixture Models in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Gaussian Mixture Models

Problem:You want to cluster data points into groups using Gaussian Mixture Models (GMM). Currently, the model fits the training data perfectly but performs poorly on new data, showing signs of overfitting.

Current Metrics:Training accuracy: 98%, Validation accuracy: 65%

Issue:The model overfits the training data, causing low validation accuracy and poor generalization.

Your Task

Reduce overfitting by improving validation accuracy to at least 85% while keeping training accuracy below 95%.

You can only adjust the number of mixture components and covariance type.

Do not change the dataset or use additional data.

Keep the training procedure simple without adding complex regularization.

Hint 1

Hint 2

Hint 3

Solution

ML Python

from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np

# Generate synthetic data
X, y_true = make_blobs(n_samples=1000, centers=3, cluster_std=1.0, random_state=42)

# Split data into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y_true, test_size=0.3, random_state=42)

# Fit GMM with fewer components and 'diag' covariance
gmm = GaussianMixture(n_components=3, covariance_type='diag', random_state=42)
gmm.fit(X_train)

# Predict cluster labels for train and validation
train_labels = gmm.predict(X_train)
val_labels = gmm.predict(X_val)

# Since cluster labels may not match true labels directly, find best label mapping
from scipy.optimize import linear_sum_assignment
from sklearn.metrics import confusion_matrix

def best_label_mapping(true_labels, pred_labels):
    cm = confusion_matrix(true_labels, pred_labels)
    row_ind, col_ind = linear_sum_assignment(-cm)
    mapping = {old: new for old, new in zip(col_ind, row_ind)}
    new_pred = np.array([mapping[label] for label in pred_labels])
    return new_pred

train_labels_mapped = best_label_mapping(y_train, train_labels)
val_labels_mapped = best_label_mapping(y_val, val_labels)

# Calculate accuracies
train_accuracy = accuracy_score(y_train, train_labels_mapped) * 100
val_accuracy = accuracy_score(y_val, val_labels_mapped) * 100

print(f"Training accuracy: {train_accuracy:.2f}%")
print(f"Validation accuracy: {val_accuracy:.2f}%")

Reduced the number of mixture components to 3, matching the true number of clusters.

Changed covariance type to 'diag' to simplify the model and reduce overfitting.

Used label mapping to correctly evaluate clustering accuracy.

Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 65%

After: Training accuracy: 93.5%, Validation accuracy: 87.2%

Reducing model complexity by limiting mixture components and choosing simpler covariance types helps reduce overfitting and improves validation accuracy in Gaussian Mixture Models.

Bonus Experiment

Try using the Bayesian Gaussian Mixture Model (BayesianGaussianMixture) to automatically select the number of components.

💡 Hint

BayesianGaussianMixture can adjust the number of clusters during training, which may improve generalization without manual tuning.

Practice

(1/5)

1. What is the main idea behind a Gaussian Mixture Model (GMM)?

easy

A. It assumes data is made of several bell-shaped groups mixed together.

B. It uses decision trees to split data into groups.

C. It finds the single best line to fit the data points.

D. It clusters data by measuring distances only.

Gaussian Mixture Models in ML Python - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand GMM concept

Step 2: Compare with other methods

Final Answer:

Quick Check:

Solution

Step 1: Identify libraries for ML models

Step 2: Check other libraries' purpose

Final Answer:

Quick Check:

Solution

Step 1: Understand data and model

Step 2: Predict labels

Final Answer:

Quick Check:

Solution

Step 1: Check data format for GMM

Step 2: Verify other parameters and method order

Final Answer:

Quick Check:

Solution

Step 1: Understand group overlap and shape

Step 2: Match GMM strengths

Final Answer:

Quick Check: