0
0
ML Pythonml~20 mins

Recursive feature elimination in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Recursive feature elimination
Problem:We want to select the most important features from a dataset to improve model performance and reduce complexity.
Current Metrics:Training accuracy: 95%, Validation accuracy: 80%
Issue:The model uses all features, which may include irrelevant ones causing overfitting and slower training.
Your Task
Use recursive feature elimination (RFE) to select the top 5 features and improve validation accuracy to at least 85%.
Use the same dataset and model type (logistic regression).
Do not change the model hyperparameters except for feature selection.
Keep the training-validation split the same.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
ML Python
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Initial model with all features
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train, y_train)
y_pred_val = model.predict(X_val)
initial_val_acc = accuracy_score(y_val, y_pred_val)

# Recursive Feature Elimination to select top 5 features
rfe = RFE(estimator=LogisticRegression(max_iter=1000, random_state=42), n_features_to_select=5)
rfe.fit(X_train, y_train)

# Select features
X_train_rfe = rfe.transform(X_train)
X_val_rfe = rfe.transform(X_val)

# Train model on selected features
model_rfe = LogisticRegression(max_iter=1000, random_state=42)
model_rfe.fit(X_train_rfe, y_train)
y_pred_val_rfe = model_rfe.predict(X_val_rfe)
rfe_val_acc = accuracy_score(y_val, y_pred_val_rfe)

print(f"Initial validation accuracy: {initial_val_acc:.2f}")
print(f"Validation accuracy after RFE: {rfe_val_acc:.2f}")
Added recursive feature elimination (RFE) to select top 5 features.
Retrained logistic regression model using only selected features.
Evaluated validation accuracy after feature selection.
Results Interpretation

Before RFE: Training accuracy = 95%, Validation accuracy = 80%

After RFE: Training accuracy = 93%, Validation accuracy = 86%

Recursive feature elimination helps remove less important features, reducing overfitting and improving validation accuracy.
Bonus Experiment
Try using RFE with a different model like Random Forest and compare the feature selection results and validation accuracy.
💡 Hint
Use sklearn's RandomForestClassifier as the estimator in RFE and observe if feature importance changes.