What is Feature importance in regression in ML Python?

ML Pythonprogramming~5 mins

Feature importance in regression in ML Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Feature importance helps us understand which input factors affect the prediction the most. It shows what matters in the data for the model.

You want to know which features influence house prices the most.

You want to simplify a model by keeping only important features.

You want to explain to others why the model makes certain predictions.

You want to detect if some features are not useful or redundant.

You want to improve model performance by focusing on key features.

Syntax

ML Python

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_train, y_train)
importances = model.feature_importances_

feature_importances_ is an attribute that gives importance scores for each feature after training.

This example uses a Random Forest model, which naturally provides feature importance.

Examples

Linear regression uses coefficients as a measure of feature importance.

ML Python

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
coefficients = model.coef_

Random Forest provides feature importance based on how much each feature reduces error.

ML Python

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)
importances = model.feature_importances_

Plotting feature importance helps visualize which features matter most.

ML Python

import matplotlib.pyplot as plt
plt.bar(feature_names, importances)
plt.title('Feature Importance')
plt.show()

Sample Program

This code trains a Random Forest regressor on the Boston housing dataset and prints the importance of each feature.

ML Python

from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
import numpy as np

# Load data
boston = load_boston()
X, y = boston.data, boston.target
feature_names = boston.feature_names

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

# Get feature importance
importances = model.feature_importances_

# Print feature importance
for name, importance in zip(feature_names, importances):
    print(f"{name}: {importance:.3f}")

OutputSuccess

Important Notes

Feature importance values are relative and sum to 1.

Different models calculate importance differently; Random Forest uses how much each feature reduces error.

High importance means the feature strongly influences predictions.

Summary

Feature importance shows which inputs affect the model most.

Random Forest models provide easy access to feature importance.

Use feature importance to explain, simplify, or improve models.