How to use feature importance sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use Feature Importance in sklearn with Python

In sklearn, you can access feature importance using the feature_importances_ attribute of tree-based models like RandomForestClassifier or GradientBoostingClassifier. After training the model, simply call model.feature_importances_ to get an array showing the importance score of each feature.

📐

Syntax

The feature importance in sklearn is accessed via the feature_importances_ attribute of fitted tree-based models.

Syntax:

model = SomeTreeModel()
model.fit(X_train, y_train)
importances = model.feature_importances_

Here, model is a trained tree-based model, and importances is a numpy array with importance scores for each feature.

python

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_

💻

Example

This example trains a Random Forest classifier on the iris dataset and prints the feature importance scores for each feature.

python

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X, y)

# Get feature importances
importances = model.feature_importances_

# Print feature names with their importance
for name, importance in zip(iris.feature_names, importances):
    print(f"{name}: {importance:.3f}")

Output

sepal length (cm): 0.110 sepal width (cm): 0.022 petal length (cm): 0.430 petal width (cm): 0.438

⚠️

Common Pitfalls

Using feature_importances_ with non-tree models: Only tree-based models like RandomForest, GradientBoosting, and DecisionTree have feature_importances_. Linear models do not.
Not fitting the model first: You must call fit() before accessing feature_importances_, or it will raise an error.
Ignoring feature order: The importance array matches the order of features in your input data, so keep track of feature names.

python

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X, y)

# This will raise AttributeError because LogisticRegression has no feature_importances_
try:
    print(model.feature_importances_)
except AttributeError as e:
    print(f"Error: {e}")

Output

Error: 'LogisticRegression' object has no attribute 'feature_importances_'

📊

Quick Reference

Step	Description
1. Choose model	Use tree-based models like RandomForestClassifier or GradientBoostingClassifier
2. Train model	Call model.fit(X_train, y_train) to train the model
3. Access importances	Use model.feature_importances_ to get importance scores
4. Interpret	Higher scores mean more important features
5. Match features	Keep track of feature names to interpret scores correctly

✅

Key Takeaways

Use tree-based sklearn models to access feature importance via feature_importances_.

Always fit the model before accessing feature_importances_ to avoid errors.

Feature importance scores correspond to the order of features in your input data.

Linear models do not provide feature_importances_; use other methods for them.

Higher feature importance means the feature has more impact on model predictions.