How to Use feature_importances_ in sklearn with Python
In sklearn, you can use the
feature_importances_ attribute of tree-based models like RandomForestClassifier to get the importance scores of each feature after training. This attribute shows how much each feature contributes to the model's decisions, helping you understand which features matter most.Syntax
The feature_importances_ attribute is accessed from a trained sklearn model that supports it, such as RandomForestClassifier or GradientBoostingClassifier. It returns a numpy array of importance scores, one for each feature.
Syntax pattern:
model = SomeTreeBasedModel() model.fit(X_train, y_train) importances = model.feature_importances_
Here, importances is an array where each value corresponds to a feature's importance.
python
model = RandomForestClassifier() model.fit(X_train, y_train) importances = model.feature_importances_
Example
This example trains a Random Forest classifier on the Iris dataset and prints the feature importances. It shows which flower features are most important for classification.
python
from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier # Load data iris = load_iris() X, y = iris.data, iris.target # Train model model = RandomForestClassifier(random_state=42) model.fit(X, y) # Get feature importances importances = model.feature_importances_ # Print feature names with their importance for name, importance in zip(iris.feature_names, importances): print(f"{name}: {importance:.3f}")
Output
sepal length (cm): 0.096
sepal width (cm): 0.022
petal length (cm): 0.430
petal width (cm): 0.452
Common Pitfalls
- Using feature_importances_ before training: The attribute is only available after calling
fit(). Accessing it before will cause an error. - Using models without feature_importances_: Not all sklearn models have this attribute (e.g.,
LogisticRegressiondoes not). Use tree-based models like Random Forest or Gradient Boosting. - Misinterpreting values: The importances are relative and sum to 1. They show contribution to the model but not causation.
python
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X, y) # This will raise AttributeError # print(model.feature_importances_) # Correct: Use a tree-based model from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier() model.fit(X, y) print(model.feature_importances_)
Quick Reference
| Step | Description |
|---|---|
| Train model | Call fit() on a tree-based sklearn model |
| Access importances | Use model.feature_importances_ after training |
| Interpret | Higher values mean more important features |
| Limitations | Only available for some models; values are relative |
Key Takeaways
Use feature_importances_ only after fitting a tree-based sklearn model.
It returns an array showing the relative importance of each feature.
Not all sklearn models have feature_importances_; prefer RandomForest or GradientBoosting.
Feature importances help identify which features influence model decisions most.
Interpret importances as relative scores, not absolute causation.