Feature importance helps us understand which input factors affect the prediction the most. It shows what matters in the data for the model.
0
0
Feature importance in regression in ML Python
Introduction
You want to know which features influence house prices the most.
You want to simplify a model by keeping only important features.
You want to explain to others why the model makes certain predictions.
You want to detect if some features are not useful or redundant.
You want to improve model performance by focusing on key features.
Syntax
ML Python
from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor() model.fit(X_train, y_train) importances = model.feature_importances_
feature_importances_ is an attribute that gives importance scores for each feature after training.
This example uses a Random Forest model, which naturally provides feature importance.
Examples
Linear regression uses coefficients as a measure of feature importance.
ML Python
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) coefficients = model.coef_
Random Forest provides feature importance based on how much each feature reduces error.
ML Python
from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor(n_estimators=100) model.fit(X_train, y_train) importances = model.feature_importances_
Plotting feature importance helps visualize which features matter most.
ML Python
import matplotlib.pyplot as plt plt.bar(feature_names, importances) plt.title('Feature Importance') plt.show()
Sample Program
This code trains a Random Forest regressor on the Boston housing dataset and prints the importance of each feature.
ML Python
from sklearn.datasets import load_boston from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split import numpy as np # Load data boston = load_boston() X, y = boston.data, boston.target feature_names = boston.feature_names # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train model model = RandomForestRegressor(random_state=42) model.fit(X_train, y_train) # Get feature importance importances = model.feature_importances_ # Print feature importance for name, importance in zip(feature_names, importances): print(f"{name}: {importance:.3f}")
OutputSuccess
Important Notes
Feature importance values are relative and sum to 1.
Different models calculate importance differently; Random Forest uses how much each feature reduces error.
High importance means the feature strongly influences predictions.
Summary
Feature importance shows which inputs affect the model most.
Random Forest models provide easy access to feature importance.
Use feature importance to explain, simplify, or improve models.