Practice

(1/5)

1. What does mutual information measure in feature selection?

easy

A. The amount of shared information between a feature and the target variable

B. The correlation coefficient between two features

C. The difference between feature means

D. The number of missing values in a feature

Solution

Step 1: Understand mutual information concept
Mutual information measures how much knowing one variable reduces uncertainty about another.
Step 2: Apply to feature selection context
In feature selection, it measures how much information a feature shares with the target variable.
Final Answer:
The amount of shared information between a feature and the target variable -> Option A
Quick Check:
Mutual information = shared info [OK]

Hint: Mutual info = shared info between feature and target [OK]

Common Mistakes:

Confusing mutual information with correlation
Thinking it measures missing data
Assuming it measures difference in means

2. Which Python function is used to compute mutual information for classification tasks?

easy

A. mutual_info_classif

B. mutual_info_regression

C. mutual_info_score

D. mutual_info_classifier

Solution

Step 1: Recall mutual information functions in sklearn
For classification, sklearn provides mutual_info_classif.
Step 2: Differentiate from regression function
mutual_info_regression is for regression, not classification.
Final Answer:
mutual_info_classif -> Option A
Quick Check:
Classification uses mutual_info_classif [OK]

Hint: Classification uses mutual_info_classif function [OK]

Common Mistakes:

Using mutual_info_regression for classification
Confusing function names
Assuming mutual_info_score exists in sklearn

3. Given this code snippet, what is the output?

from sklearn.feature_selection import mutual_info_classif
import numpy as np
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y = np.array([0, 1, 0, 1])
mi = mutual_info_classif(X, y, discrete_features=[True, True])
print(np.round(mi, 2))

medium

A. [0.0 0.0]

B. [0.69 0.0]

C. [0.0 0.69]

D. [0.69 0.69]

Solution

Step 1: Understand input data and parameters
X has two discrete features, y is binary. Using mutual_info_classif with discrete_features=True for both.
Step 2: Calculate mutual information values
Both features vary similarly with y, so both have similar mutual information around 0.69 (close to ln(2)).
Final Answer:
[0.69 0.69] -> Option D
Quick Check:
Both features share info with y ~0.69 [OK]

Hint: Discrete features with binary target give ~0.69 MI if informative [OK]

Common Mistakes:

Assuming zero mutual information for all features
Mixing up discrete_features parameter
Rounding errors in output

4. Identify the error in this code snippet for mutual information feature selection:

from sklearn.feature_selection import mutual_info_classif
X = [[1, 2], [2, 3], [3, 4]]
y = [0, 1, 0]
mi = mutual_info_classif(X, y)
print(mi)

medium

A. y should be a 2D array, not 1D

B. X should be a numpy array, not a list of lists

C. mutual_info_classif requires discrete_features parameter

D. mutual_info_classif cannot handle integer data

Solution

Step 1: Check input data types
mutual_info_classif expects numpy arrays or similar, not plain Python lists.
Step 2: Identify error cause
Passing list of lists for X can cause unexpected behavior or errors; converting to numpy array fixes this.
Final Answer:
X should be a numpy array, not a list of lists -> Option B
Quick Check:
Use numpy arrays for X [OK]

Hint: Always convert input data to numpy arrays before sklearn functions [OK]

Common Mistakes:

Thinking y must be 2D
Assuming discrete_features is always required
Believing mutual_info_classif rejects integer data

5. You have a dataset with 10 features. After computing mutual information scores, you find two features have the highest scores but are highly correlated with each other. What is the best approach to select features?

hard

A. Select both features because they have the highest mutual information

B. Select features randomly to avoid bias

C. Select only one of the two correlated features with the highest mutual information

D. Discard both features to avoid redundancy

Solution

Step 1: Understand mutual information and correlation
High mutual information means features are informative, but high correlation means redundancy.
Step 2: Choose features to reduce redundancy
To avoid redundant information, select only one of the correlated features with the highest mutual information.
Final Answer:
Select only one of the two correlated features with the highest mutual information -> Option C
Quick Check:
Pick one correlated feature with highest MI [OK]

Hint: Avoid redundant features by picking one with highest MI [OK]

Common Mistakes:

Selecting both correlated features causing redundancy
Discarding informative features unnecessarily
Choosing features randomly without criteria

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.65	0.60	Model starts learning with moderate loss and accuracy
2	0.50	0.72	Loss decreases and accuracy improves
3	0.42	0.78	Model continues to improve
4	0.38	0.82	Loss decreases further, accuracy rises
5	0.35	0.85	Training converges with good accuracy

Mutual information for feature selection in ML Python - Model Pipeline Trace

Start learning this pattern below

Practice

Solution

Step 1: Understand mutual information concept

Step 2: Apply to feature selection context

Final Answer:

Quick Check:

Solution

Step 1: Recall mutual information functions in sklearn

Step 2: Differentiate from regression function

Final Answer:

Quick Check:

Solution

Step 1: Understand input data and parameters

Step 2: Calculate mutual information values

Final Answer:

Quick Check:

Solution

Step 1: Check input data types

Step 2: Identify error cause

Final Answer:

Quick Check:

Solution

Step 1: Understand mutual information and correlation

Step 2: Choose features to reduce redundancy

Final Answer:

Quick Check: