How to Use SelectKBest in sklearn for Feature Selection in Python
Use
SelectKBest from sklearn.feature_selection to select the top k features based on a scoring function. Initialize it with a score function like f_classif and the number of features k, then fit it to your data and transform your features.Syntax
The basic syntax for using SelectKBest is:
SelectKBest(score_func, k): Creates a selector that picks the topkfeatures based on thescore_func.score_func: A function that scores each feature, e.g.,f_classiffor classification tasks.k: Number of top features to select. Usek='all'to keep all features.- Use
fit(X, y)to compute scores andtransform(X)to reduce features.
python
from sklearn.feature_selection import SelectKBest, f_classif selector = SelectKBest(score_func=f_classif, k=3) X_new = selector.fit_transform(X, y)
Example
This example shows how to select the top 2 features from the Iris dataset using SelectKBest with the f_classif scoring function.
python
from sklearn.datasets import load_iris from sklearn.feature_selection import SelectKBest, f_classif # Load data iris = load_iris() X, y = iris.data, iris.target # Select top 2 features selector = SelectKBest(score_func=f_classif, k=2) X_new = selector.fit_transform(X, y) print('Original shape:', X.shape) print('Reduced shape:', X_new.shape) print('Selected feature indices:', selector.get_support(indices=True))
Output
Original shape: (150, 4)
Reduced shape: (150, 2)
Selected feature indices: [2 3]
Common Pitfalls
Common mistakes when using SelectKBest include:
- Not fitting the selector with both features
Xand targety, which is required for scoring functions likef_classif. - Choosing
klarger than the number of features, which causes an error. - Forgetting to transform the data after fitting, so the feature selection is not applied.
- Using an incompatible scoring function for the task (e.g., regression vs classification).
python
from sklearn.feature_selection import SelectKBest, f_classif # Wrong: forgetting y in fit # selector = SelectKBest(score_func=f_classif, k=2) # selector.fit(X) # This will raise an error # Right way: selector = SelectKBest(score_func=f_classif, k=2) selector.fit(X, y) X_new = selector.transform(X)
Quick Reference
| Parameter | Description |
|---|---|
| score_func | Function to score features (e.g., f_classif, chi2) |
| k | Number of top features to select (int or 'all') |
| fit(X, y) | Compute scores using features X and target y |
| transform(X) | Reduce X to selected features |
| get_support() | Boolean mask of selected features |
| get_support(indices=True) | Indices of selected features |
Key Takeaways
SelectKBest selects top features based on a scoring function and number k.
Always fit SelectKBest with both features and target to compute scores correctly.
Use transform() after fitting to reduce your feature set.
Choose a scoring function that matches your task type (classification or regression).
Check selected feature indices with get_support() to understand which features remain.