How to use feature_importances_ sklearn in python

MlopsHow-ToBeginner · 3 min read

How to Use feature_importances_ in sklearn with Python

In sklearn, you can use the feature_importances_ attribute of tree-based models like RandomForestClassifier to get the importance scores of each feature after training. This attribute shows how much each feature contributes to the model's decisions, helping you understand which features matter most.

📐

Syntax

The feature_importances_ attribute is accessed from a trained sklearn model that supports it, such as RandomForestClassifier or GradientBoostingClassifier. It returns a numpy array of importance scores, one for each feature.

Syntax pattern:

model = SomeTreeBasedModel()
model.fit(X_train, y_train)
importances = model.feature_importances_

Here, importances is an array where each value corresponds to a feature's importance.

python

model = RandomForestClassifier()
model.fit(X_train, y_train)
importances = model.feature_importances_

💻

Example

This example trains a Random Forest classifier on the Iris dataset and prints the feature importances. It shows which flower features are most important for classification.

python

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X, y)

# Get feature importances
importances = model.feature_importances_

# Print feature names with their importance
for name, importance in zip(iris.feature_names, importances):
    print(f"{name}: {importance:.3f}")

Output

sepal length (cm): 0.096 sepal width (cm): 0.022 petal length (cm): 0.430 petal width (cm): 0.452

⚠️

Common Pitfalls

Using feature_importances_ before training: The attribute is only available after calling fit(). Accessing it before will cause an error.
Using models without feature_importances_: Not all sklearn models have this attribute (e.g., LogisticRegression does not). Use tree-based models like Random Forest or Gradient Boosting.
Misinterpreting values: The importances are relative and sum to 1. They show contribution to the model but not causation.

python

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)

# This will raise AttributeError
# print(model.feature_importances_)

# Correct: Use a tree-based model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
print(model.feature_importances_)

📊

Quick Reference

Step	Description
Train model	Call fit() on a tree-based sklearn model
Access importances	Use model.feature_importances_ after training
Interpret	Higher values mean more important features
Limitations	Only available for some models; values are relative

✅

Key Takeaways

Use feature_importances_ only after fitting a tree-based sklearn model.

It returns an array showing the relative importance of each feature.

Not all sklearn models have feature_importances_; prefer RandomForest or GradientBoosting.

Feature importances help identify which features influence model decisions most.

Interpret importances as relative scores, not absolute causation.