Recursive feature elimination helps find the most important features in your data by removing less useful ones step by step.
Recursive feature elimination in ML Python
Start learning this pattern below
Jump into concepts and practice - no test required
from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression model = LogisticRegression() rfe = RFE(estimator=model, n_features_to_select=number_of_features) rfe.fit(X, y) selected_features = rfe.support_
estimator is the model used to judge feature importance.
n_features_to_select is how many features you want to keep.
from sklearn.feature_selection import RFE from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() rfe = RFE(estimator=model, n_features_to_select=3) rfe.fit(X, y) print(rfe.support_)
from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression model = LogisticRegression() rfe = RFE(estimator=model, n_features_to_select=1) rfe.fit(X, y) print(rfe.support_)
from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression model = LogisticRegression() rfe = RFE(estimator=model, n_features_to_select=0) rfe.fit(X, y) print(rfe.support_)
This program loads the iris flower data, uses recursive feature elimination to keep the two most important features, and then predicts the flower type using only those features.
from sklearn.datasets import load_iris from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression # Load data iris = load_iris() X = iris.data y = iris.target # Create model model = LogisticRegression(max_iter=200) # Use RFE to select top 2 features rfe = RFE(estimator=model, n_features_to_select=2) rfe.fit(X, y) # Show which features were selected print('Selected features mask:', rfe.support_) # Show ranking of features (1 means selected) print('Feature ranking:', rfe.ranking_) # Predict using selected features X_selected = rfe.transform(X) predictions = rfe.estimator_.predict(X_selected) # Show first 5 predictions print('First 5 predictions:', predictions[:5])
Time complexity depends on the estimator and number of features; it can be slow for many features.
Space complexity is similar to the estimator's requirements.
Common mistake: setting n_features_to_select to zero or more than total features causes errors.
Use RFE when you want to reduce features based on model importance; use other methods if you want filter-based selection.
Recursive feature elimination removes less important features step by step.
It helps improve model speed and understanding by keeping only key features.
You need to choose a model to judge feature importance and how many features to keep.
Practice
Recursive Feature Elimination (RFE) in machine learning?Solution
Step 1: Understand the purpose of RFE
RFE works by removing less important features one at a time to keep only the best ones.Step 2: Compare options to the purpose
Only To select the most important features by removing less important ones step by step describes this step-by-step removal of less important features.Final Answer:
To select the most important features by removing less important ones step by step -> Option AQuick Check:
RFE = Stepwise feature removal [OK]
- Thinking RFE adds or creates features
- Confusing RFE with random feature shuffling
- Believing RFE increases feature count
Solution
Step 1: Recall the correct import statement
The class is namedRFEand is insklearn.feature_selection.Step 2: Match options with correct syntax
from sklearn.feature_selection import RFE correctly importsRFEfromsklearn.feature_selection.Final Answer:
from sklearn.feature_selection import RFE -> Option BQuick Check:
Correct import is 'from sklearn.feature_selection import RFE' [OK]
- Using wrong module name like sklearn.selection
- Trying to import full name RecursiveFeatureElimination
- Using incorrect import syntax
print(selected_features)?
from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression from sklearn.feature_selection import RFE iris = load_iris() X, y = iris.data, iris.target model = LogisticRegression(max_iter=200) rfe = RFE(model, n_features_to_select=2) rfe.fit(X, y) selected_features = rfe.support_ print(selected_features)
Solution
Step 1: Understand RFE output
Thesupport_support_attribute is a boolean array showing which features are selected.Step 2: Run RFE with LogisticRegression on iris dataset
RFE selects the two most important features, which for iris are the last two features (petal length and petal width), so the output is [False False True True].Final Answer:
[False False True True ] -> Option DQuick Check:
RFE selects last two iris features = [False False True True] [OK]
- Assuming first two features are selected
- Confusing support_ with ranking_
- Not setting max_iter causing convergence warnings
from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression model = LogisticRegression() rfe = RFE(model, n_features_to_select=0) rfe.fit(X, y)
Solution
Step 1: Check parameter
This parameter must be at least 1 or None, zero is invalid.n_features_to_selectStep 2: Identify correct fix
Settingn_features_to_selectto a positive integer fixes the error.Final Answer:
n_features_to_select cannot be zero; set it to a positive integer -> Option AQuick Check:
n_features_to_select > 0 required [OK]
- Setting n_features_to_select to zero
- Wrong import paths for LogisticRegression
- Thinking random_state is mandatory for RFE
df and target in y?Solution
Step 1: Check correct fit method usage
Features (df) must be first argument, target (y) second infit.Step 2: Select features using
Usesupport_boolean maskrfe.support_to get selected features, then map to column names.Final Answer:
Code snippet A correctly fits and selects features using support_ mask -> Option CQuick Check:
fit(df, y) + support_ mask = correct feature selection [OK]
- Swapping X and y in fit method
- Using ranking_ == 5 instead of support_
- Not converting boolean mask to column names
