0
0
ML Pythonml~5 mins

Feature selection methods in ML Python

Choose your learning style9 modes available
Introduction

Feature selection helps pick the most important information from data. This makes models simpler and faster.

When you have many data features and want to find the most useful ones.
When you want to reduce the time it takes to train a model.
When you want to avoid confusing the model with irrelevant data.
When you want to improve model accuracy by removing noise.
When you want to understand which features affect predictions the most.
Syntax
ML Python
from sklearn.feature_selection import SelectKBest, chi2

selector = SelectKBest(score_func=chi2, k=3)
X_new = selector.fit_transform(X, y)

SelectKBest picks the top k features based on a scoring function.

score_func can be different tests like chi2 for classification.

Examples
This selects the top 2 features using ANOVA F-value for classification tasks.
ML Python
from sklearn.feature_selection import SelectKBest, f_classif

selector = SelectKBest(score_func=f_classif, k=2)
X_new = selector.fit_transform(X, y)
Recursive Feature Elimination (RFE) removes less important features step-by-step using a model.
ML Python
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
rfe = RFE(model, n_features_to_select=3)
X_new = rfe.fit_transform(X, y)
VarianceThreshold removes features with low variance (almost constant features).
ML Python
from sklearn.feature_selection import VarianceThreshold

selector = VarianceThreshold(threshold=0.1)
X_new = selector.fit_transform(X)
Sample Model

This code loads the iris dataset, selects the top 2 features using ANOVA F-value, and prints the results.

ML Python
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif

# Load example data
X, y = load_iris(return_X_y=True)

# Select top 2 features using ANOVA F-value
selector = SelectKBest(score_func=f_classif, k=2)
X_new = selector.fit_transform(X, y)

print('Original shape:', X.shape)
print('New shape after feature selection:', X_new.shape)
print('Selected feature scores:', selector.scores_)
print('Selected features mask:', selector.get_support())
OutputSuccess
Important Notes

Feature selection can improve model speed and reduce overfitting.

Always check if feature selection improves your model by testing.

Some methods need target labels (supervised), others don't (unsupervised).

Summary

Feature selection picks the most useful data features for your model.

Common methods include SelectKBest, RFE, and VarianceThreshold.

Using feature selection can make models simpler, faster, and sometimes more accurate.