Which statement best describes how feature importance is calculated in a decision tree model?
Think about how decision trees decide where to split data.
Decision trees calculate feature importance by summing how much each feature reduces impurity (like Gini or entropy) when used for splits, weighted by the number of samples it splits.
What is the output of the following Python code using scikit-learn's RandomForestClassifier?
from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target model = RandomForestClassifier(random_state=0) model.fit(X, y) importances = model.feature_importances_ print([round(i, 2) for i in importances])
Random forests usually assign higher importance to features that better split the data.
The output shows the relative importance of each feature in the iris dataset as computed by the random forest. Sepal length and width have lower importance compared to petal length and petal width.
How does setting the max_features parameter to a low value in a Random Forest affect the computed feature importance?
Think about how limiting features at splits changes the model's view of data.
When max_features is low, each tree sees fewer features at each split, so some features may be less likely to be chosen, lowering their importance.
Which statement correctly describes what SHAP values represent in feature importance analysis?
SHAP values explain individual predictions by comparing to a baseline.
SHAP values quantify each feature's contribution to the difference between the model's average output and the specific prediction, helping explain model decisions.
What error will the following code raise when trying to get feature importances from a trained model?
from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target model = LogisticRegression(max_iter=200) model.fit(X, y) print(model.feature_importances_)
Not all models provide feature importance attributes.
LogisticRegression does not have a feature_importances_ attribute. This attribute exists in tree-based models like RandomForest but not in linear models.