How to Use Feature Importance in sklearn with Python
In sklearn, you can access feature importance using the
feature_importances_ attribute of tree-based models like RandomForestClassifier or GradientBoostingClassifier. After training the model, simply call model.feature_importances_ to get an array showing the importance score of each feature.Syntax
The feature importance in sklearn is accessed via the feature_importances_ attribute of fitted tree-based models.
Syntax:
model = SomeTreeModel() model.fit(X_train, y_train) importances = model.feature_importances_
Here, model is a trained tree-based model, and importances is a numpy array with importance scores for each feature.
python
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier() model.fit(X_train, y_train) importances = model.feature_importances_
Example
This example trains a Random Forest classifier on the iris dataset and prints the feature importance scores for each feature.
python
from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier # Load data iris = load_iris() X, y = iris.data, iris.target # Train model model = RandomForestClassifier(random_state=42) model.fit(X, y) # Get feature importances importances = model.feature_importances_ # Print feature names with their importance for name, importance in zip(iris.feature_names, importances): print(f"{name}: {importance:.3f}")
Output
sepal length (cm): 0.110
sepal width (cm): 0.022
petal length (cm): 0.430
petal width (cm): 0.438
Common Pitfalls
- Using feature_importances_ with non-tree models: Only tree-based models like RandomForest, GradientBoosting, and DecisionTree have
feature_importances_. Linear models do not. - Not fitting the model first: You must call
fit()before accessingfeature_importances_, or it will raise an error. - Ignoring feature order: The importance array matches the order of features in your input data, so keep track of feature names.
python
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X, y) # This will raise AttributeError because LogisticRegression has no feature_importances_ try: print(model.feature_importances_) except AttributeError as e: print(f"Error: {e}")
Output
Error: 'LogisticRegression' object has no attribute 'feature_importances_'
Quick Reference
| Step | Description |
|---|---|
| 1. Choose model | Use tree-based models like RandomForestClassifier or GradientBoostingClassifier |
| 2. Train model | Call model.fit(X_train, y_train) to train the model |
| 3. Access importances | Use model.feature_importances_ to get importance scores |
| 4. Interpret | Higher scores mean more important features |
| 5. Match features | Keep track of feature names to interpret scores correctly |
Key Takeaways
Use tree-based sklearn models to access feature importance via feature_importances_.
Always fit the model before accessing feature_importances_ to avoid errors.
Feature importance scores correspond to the order of features in your input data.
Linear models do not provide feature_importances_; use other methods for them.
Higher feature importance means the feature has more impact on model predictions.