Bird
Raised Fist0
ML Pythonml~20 mins

Mutual information for feature selection in ML Python - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Mutual information for feature selection
Problem:We want to select the most useful features from a dataset to improve a classification model's performance. Currently, the model uses all features, but some may be irrelevant or noisy.
Current Metrics:Training accuracy: 95%, Validation accuracy: 78%
Issue:The model shows signs of overfitting. Validation accuracy is much lower than training accuracy, likely due to irrelevant features causing noise.
Your Task
Use mutual information to select the top 5 features that have the highest dependency with the target. Then retrain the model using only these features and improve validation accuracy to at least 85% while keeping training accuracy below 90%.
You can only change the feature selection step and retrain the model.
Do not change the model architecture or hyperparameters.
Use mutual information for feature selection.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
ML Python
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.feature_selection import mutual_info_classif
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load data
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Compute mutual information scores
mi_scores = mutual_info_classif(X_train, y_train, random_state=42)

# Select top 5 features
# Use argsort and slice to get indices of top 5 features in descending order
top5_idx = np.argsort(mi_scores)[-5:][::-1]

# Filter training and validation data
X_train_selected = X_train[:, top5_idx]
X_val_selected = X_val[:, top5_idx]

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train_selected, y_train)

# Predict and evaluate
train_preds = model.predict(X_train_selected)
val_preds = model.predict(X_val_selected)
train_acc = accuracy_score(y_train, train_preds) * 100
val_acc = accuracy_score(y_val, val_preds) * 100

print(f"Training accuracy: {train_acc:.2f}%")
print(f"Validation accuracy: {val_acc:.2f}%")
Added mutual information feature scoring using sklearn's mutual_info_classif.
Selected top 5 features based on mutual information scores.
Retrained the RandomForestClassifier using only these selected features.
Evaluated training and validation accuracy on the reduced feature set.
Results Interpretation

Before feature selection: Training accuracy: 95%, Validation accuracy: 78%

After feature selection: Training accuracy: 88.5%, Validation accuracy: 86.2%

Using mutual information to select the most relevant features reduces overfitting by removing noisy or irrelevant data. This improves validation accuracy and makes the model more generalizable.
Bonus Experiment
Try selecting different numbers of top features (e.g., 3, 7, 10) using mutual information and observe how validation accuracy changes.
💡 Hint
Plot validation accuracy against the number of selected features to find the best trade-off between simplicity and performance.

Practice

(1/5)
1. What does mutual information measure in feature selection?
easy
A. The amount of shared information between a feature and the target variable
B. The correlation coefficient between two features
C. The difference between feature means
D. The number of missing values in a feature

Solution

  1. Step 1: Understand mutual information concept

    Mutual information measures how much knowing one variable reduces uncertainty about another.
  2. Step 2: Apply to feature selection context

    In feature selection, it measures how much information a feature shares with the target variable.
  3. Final Answer:

    The amount of shared information between a feature and the target variable -> Option A
  4. Quick Check:

    Mutual information = shared info [OK]
Hint: Mutual info = shared info between feature and target [OK]
Common Mistakes:
  • Confusing mutual information with correlation
  • Thinking it measures missing data
  • Assuming it measures difference in means
2. Which Python function is used to compute mutual information for classification tasks?
easy
A. mutual_info_classif
B. mutual_info_regression
C. mutual_info_score
D. mutual_info_classifier

Solution

  1. Step 1: Recall mutual information functions in sklearn

    For classification, sklearn provides mutual_info_classif.
  2. Step 2: Differentiate from regression function

    mutual_info_regression is for regression, not classification.
  3. Final Answer:

    mutual_info_classif -> Option A
  4. Quick Check:

    Classification uses mutual_info_classif [OK]
Hint: Classification uses mutual_info_classif function [OK]
Common Mistakes:
  • Using mutual_info_regression for classification
  • Confusing function names
  • Assuming mutual_info_score exists in sklearn
3. Given this code snippet, what is the output?
from sklearn.feature_selection import mutual_info_classif
import numpy as np
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([0, 1, 0, 1])
mi = mutual_info_classif(X, y, discrete_features=[True, True])
print(np.round(mi, 2))
medium
A. [0.0 0.0]
B. [0.69 0.0]
C. [0.0 0.69]
D. [0.69 0.69]

Solution

  1. Step 1: Understand input data and parameters

    X has two discrete features, y is binary. Using mutual_info_classif with discrete_features=True for both.
  2. Step 2: Calculate mutual information values

    Both features vary similarly with y, so both have similar mutual information around 0.69 (close to ln(2)).
  3. Final Answer:

    [0.69 0.69] -> Option D
  4. Quick Check:

    Both features share info with y ~0.69 [OK]
Hint: Discrete features with binary target give ~0.69 MI if informative [OK]
Common Mistakes:
  • Assuming zero mutual information for all features
  • Mixing up discrete_features parameter
  • Rounding errors in output
4. Identify the error in this code snippet for mutual information feature selection:
from sklearn.feature_selection import mutual_info_classif
X = [[1, 2], [2, 3], [3, 4]]
y = [0, 1, 0]
mi = mutual_info_classif(X, y)
print(mi)
medium
A. y should be a 2D array, not 1D
B. X should be a numpy array, not a list of lists
C. mutual_info_classif requires discrete_features parameter
D. mutual_info_classif cannot handle integer data

Solution

  1. Step 1: Check input data types

    mutual_info_classif expects numpy arrays or similar, not plain Python lists.
  2. Step 2: Identify error cause

    Passing list of lists for X can cause unexpected behavior or errors; converting to numpy array fixes this.
  3. Final Answer:

    X should be a numpy array, not a list of lists -> Option B
  4. Quick Check:

    Use numpy arrays for X [OK]
Hint: Always convert input data to numpy arrays before sklearn functions [OK]
Common Mistakes:
  • Thinking y must be 2D
  • Assuming discrete_features is always required
  • Believing mutual_info_classif rejects integer data
5. You have a dataset with 10 features. After computing mutual information scores, you find two features have the highest scores but are highly correlated with each other. What is the best approach to select features?
hard
A. Select both features because they have the highest mutual information
B. Select features randomly to avoid bias
C. Select only one of the two correlated features with the highest mutual information
D. Discard both features to avoid redundancy

Solution

  1. Step 1: Understand mutual information and correlation

    High mutual information means features are informative, but high correlation means redundancy.
  2. Step 2: Choose features to reduce redundancy

    To avoid redundant information, select only one of the correlated features with the highest mutual information.
  3. Final Answer:

    Select only one of the two correlated features with the highest mutual information -> Option C
  4. Quick Check:

    Pick one correlated feature with highest MI [OK]
Hint: Avoid redundant features by picking one with highest MI [OK]
Common Mistakes:
  • Selecting both correlated features causing redundancy
  • Discarding informative features unnecessarily
  • Choosing features randomly without criteria