Which statement best describes the role of mutual information in feature selection?
Think about how mutual information captures relationships beyond just linear ones.
Mutual information measures how much knowing the value of one variable reduces uncertainty about another, capturing all types of dependencies, not just linear correlations.
What is the output of the following Python code that calculates mutual information between features and a binary target?
from sklearn.feature_selection import mutual_info_classif import numpy as np X = np.array([[1, 2, 3], [1, 3, 3], [0, 2, 1], [0, 3, 1]]) y = np.array([0, 1, 0, 1]) mi = mutual_info_classif(X, y, discrete_features=[True, True, True], random_state=0) print([round(v, 2) for v in mi])
Mutual information is non-negative and measures dependency; check which feature varies with the target.
The second feature varies with the target, so it has positive mutual information; the others do not.
You have 10 features and their mutual information scores with the target. Which approach best selects features to improve model performance?
Consider both relevance to target and redundancy among features.
Choosing features that are both relevant and not redundant helps improve model performance by providing diverse information.
When using mutual_info_classif from scikit-learn, which hyperparameter affects the smoothness of the mutual information estimate for continuous features?
Think about parameters controlling neighborhood size in nearest neighbor estimation.
The n_neighbors parameter controls the number of neighbors used in the k-nearest neighbors method for estimating mutual information, affecting smoothness.
What error will the following code raise when calculating mutual information, and why?
from sklearn.feature_selection import mutual_info_classif import numpy as np X = np.array([[1.5, 2.3], [3.1, 4.7], [5.2, 6.8]]) y = np.array([0, 1, 0]) mi = mutual_info_classif(X, y, discrete_features=True) print(mi)
Check the type and shape of discrete_features parameter relative to input data.
Setting discrete_features=True treats all features as discrete, but with continuous data this causes a TypeError; it should be a boolean array or False for continuous features.