Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the main goal of feature selection in machine learning?
The main goal of feature selection is to choose the most important features from the data to improve model performance, reduce overfitting, and make the model simpler and faster.
Click to reveal answer
beginner
Name three common types of feature selection methods.
The three common types are: 1) Filter methods, 2) Wrapper methods, and 3) Embedded methods.
Click to reveal answer
intermediate
How do filter methods select features?
Filter methods select features based on their statistical relationship with the target variable, such as correlation or mutual information, without involving any machine learning model.
Click to reveal answer
intermediate
What is the difference between wrapper and embedded methods?
Wrapper methods use a machine learning model to evaluate feature subsets by training and testing repeatedly, while embedded methods perform feature selection during the model training process itself.
Click to reveal answer
beginner
Why can feature selection help reduce overfitting?
Feature selection removes irrelevant or noisy features, which reduces the chance that the model learns patterns from noise, thus helping the model generalize better to new data.
Click to reveal answer
Which feature selection method evaluates features using a model repeatedly?
AWrapper methods
BFilter methods
CEmbedded methods
DDimensionality reduction
✗ Incorrect
Wrapper methods select features by training and testing a model on different feature subsets repeatedly.
Which method selects features based on correlation with the target variable?
AFilter methods
BWrapper methods
CEmbedded methods
DClustering
✗ Incorrect
Filter methods use statistical measures like correlation to select features without involving a model.
What is a key advantage of embedded feature selection methods?
AThey do not require training a model
BThey always select all features
CThey use only statistical tests
DThey select features during model training
✗ Incorrect
Embedded methods perform feature selection as part of the model training process.
Feature selection helps reduce overfitting by:
AIncreasing model complexity
BAdding more features
CRemoving irrelevant features
DIgnoring the target variable
✗ Incorrect
Removing irrelevant or noisy features helps the model avoid learning noise, reducing overfitting.
Which of these is NOT a feature selection method?
AFilter method
BPrincipal Component Analysis
CEmbedded method
DWrapper method
✗ Incorrect
Principal Component Analysis is a dimensionality reduction technique, not a feature selection method.
Explain the differences between filter, wrapper, and embedded feature selection methods.
Think about whether the method uses a model and when feature selection happens.
You got /3 concepts.
Describe why feature selection is important in building machine learning models.
Consider how fewer features affect model learning and performance.
You got /4 concepts.
Practice
(1/5)
1. Which of the following best describes the purpose of feature selection in machine learning?
easy
A. To choose the most important features to improve model performance
B. To increase the number of features in the dataset
C. To randomly remove features from the dataset
D. To convert features into labels for training
Solution
Step 1: Understand feature selection goal
Feature selection aims to pick the most useful features that help the model learn better.
Step 2: Evaluate options
Only To choose the most important features to improve model performance correctly states that feature selection chooses important features to improve model performance.
Final Answer:
To choose the most important features to improve model performance -> Option A
Quick Check:
Feature selection = pick important features [OK]
Hint: Feature selection picks useful features, not random or all [OK]
Common Mistakes:
Thinking feature selection adds features
Confusing feature selection with feature engineering
Believing feature selection changes labels
2. Which Python library provides the SelectKBest feature selection method?
easy
A. pandas
B. scikit-learn
C. numpy
D. matplotlib
Solution
Step 1: Recall common ML libraries
Scikit-learn is the main library for machine learning tools including feature selection.
Step 2: Match method to library
SelectKBest is part of scikit-learn's feature_selection module, not pandas, numpy, or matplotlib.
Final Answer:
scikit-learn -> Option B
Quick Check:
SelectKBest = scikit-learn [OK]
Hint: SelectKBest is from scikit-learn, not data or plotting libs [OK]
Common Mistakes:
Choosing pandas because it handles data
Confusing numpy with ML feature tools
Selecting matplotlib which is for plotting
3. What will be the output shape of features after applying VarianceThreshold(threshold=0.1) on a dataset with shape (100, 5) where only 3 features have variance above 0.1?
medium
A. (5, 100)
B. (100, 5)
C. (3, 100)
D. (100, 3)
Solution
Step 1: Understand VarianceThreshold effect
VarianceThreshold removes features with variance below the threshold, keeping only those above it.
Step 2: Apply to given data
Since 3 features have variance above 0.1, only those 3 remain. The number of samples (100) stays the same.
Final Answer:
(100, 3) -> Option D
Quick Check:
VarianceThreshold keeps features with variance > threshold [OK]
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
rfe = RFE(model, n_features_to_select=2)
rfe.fit(X, y)
selected = rfe.transform(X)
print(selected.shape)
If X has shape (50, 4), but the output shape is (50, 4), what is the likely error?
medium
A. RFE does not reduce features automatically
B. n_features_to_select is greater than number of features
C. RFE was not fitted before transform
D. LogisticRegression model is incompatible with RFE
Solution
Step 1: Understand RFE usage
RFE must be fitted before calling transform to reduce features.
Step 2: Check given code and output
If output shape is unchanged, likely transform was called before fitting or fitting failed.
Step 3: Identify cause
Since code shows fitting before transform, but output shape unchanged, the most common cause is that transform was called on unfitted RFE or fit did not complete properly.
Final Answer:
RFE was not fitted before transform -> Option C
Quick Check:
Fit RFE before transform to reduce features [OK]
Hint: Ensure RFE is fitted before transform [OK]
Common Mistakes:
Assuming transform always reduces features without fitting
Ignoring the need to fit RFE
Thinking model type causes shape issue
5. You have a dataset with 10 features, but 4 are highly correlated and 2 have very low variance. Which feature selection approach best improves model simplicity and speed?
hard
A. Apply VarianceThreshold to remove low variance, then use correlation filter to drop correlated features
B. Use RFE with all features and keep all 10
C. Use SelectKBest to pick top 6 features by univariate scores
D. Randomly drop 4 features to reduce dimensionality
Solution
Step 1: Identify problem features
Low variance features add little info; correlated features add redundancy.
Apply VarianceThreshold to remove low variance, then use correlation filter to drop correlated features combines both methods to improve simplicity and speed effectively.
Final Answer:
Apply VarianceThreshold to remove low variance, then use correlation filter to drop correlated features -> Option A
Quick Check:
Remove low variance + correlated features = simpler model [OK]
Hint: Combine variance and correlation filters for best feature reduction [OK]
Common Mistakes:
Using only one method ignoring other feature issues