0
0
MlopsHow-ToBeginner · 3 min read

How to Use Feature Importance in sklearn with Python

In sklearn, you can access feature importance using the feature_importances_ attribute of tree-based models like RandomForestClassifier or GradientBoostingClassifier. After training the model, simply call model.feature_importances_ to get an array showing the importance score of each feature.
📐

Syntax

The feature importance in sklearn is accessed via the feature_importances_ attribute of fitted tree-based models.

Syntax:

model = SomeTreeModel()
model.fit(X_train, y_train)
importances = model.feature_importances_

Here, model is a trained tree-based model, and importances is a numpy array with importance scores for each feature.

python
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_
💻

Example

This example trains a Random Forest classifier on the iris dataset and prints the feature importance scores for each feature.

python
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X, y)

# Get feature importances
importances = model.feature_importances_

# Print feature names with their importance
for name, importance in zip(iris.feature_names, importances):
    print(f"{name}: {importance:.3f}")
Output
sepal length (cm): 0.110 sepal width (cm): 0.022 petal length (cm): 0.430 petal width (cm): 0.438
⚠️

Common Pitfalls

  • Using feature_importances_ with non-tree models: Only tree-based models like RandomForest, GradientBoosting, and DecisionTree have feature_importances_. Linear models do not.
  • Not fitting the model first: You must call fit() before accessing feature_importances_, or it will raise an error.
  • Ignoring feature order: The importance array matches the order of features in your input data, so keep track of feature names.
python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X, y)

# This will raise AttributeError because LogisticRegression has no feature_importances_
try:
    print(model.feature_importances_)
except AttributeError as e:
    print(f"Error: {e}")
Output
Error: 'LogisticRegression' object has no attribute 'feature_importances_'
📊

Quick Reference

StepDescription
1. Choose modelUse tree-based models like RandomForestClassifier or GradientBoostingClassifier
2. Train modelCall model.fit(X_train, y_train) to train the model
3. Access importancesUse model.feature_importances_ to get importance scores
4. InterpretHigher scores mean more important features
5. Match featuresKeep track of feature names to interpret scores correctly

Key Takeaways

Use tree-based sklearn models to access feature importance via feature_importances_.
Always fit the model before accessing feature_importances_ to avoid errors.
Feature importance scores correspond to the order of features in your input data.
Linear models do not provide feature_importances_; use other methods for them.
Higher feature importance means the feature has more impact on model predictions.