0
0
ML Pythonml~20 mins

Pipeline with GridSearchCV in ML Python - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Pipeline with GridSearchCV
Problem:You want to build a model to classify iris flowers into species using a pipeline that scales data and applies a classifier. Currently, you use a fixed model without tuning hyperparameters.
Current Metrics:Training accuracy: 95%, Validation accuracy: 90%
Issue:The model is good but not optimized. You want to improve validation accuracy by tuning hyperparameters using GridSearchCV within a pipeline.
Your Task
Use Pipeline with GridSearchCV to tune hyperparameters of the classifier and scaler to improve validation accuracy to above 93%.
Use sklearn Pipeline and GridSearchCV.
Tune at least two hyperparameters of the classifier.
Use the iris dataset from sklearn.
Do not change the dataset or model type.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
ML Python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create pipeline
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svc', SVC())
])

# Define parameter grid
param_grid = {
    'svc__C': [0.1, 1, 10],
    'svc__kernel': ['linear', 'rbf'],
    'svc__gamma': ['scale', 'auto']
}

# Setup GridSearchCV
grid_search = GridSearchCV(pipe, param_grid, cv=5, n_jobs=-1)

# Fit model
grid_search.fit(X_train, y_train)

# Predict and evaluate
y_train_pred = grid_search.predict(X_train)
y_val_pred = grid_search.predict(X_val)

train_acc = accuracy_score(y_train, y_train_pred) * 100
val_acc = accuracy_score(y_val, y_val_pred) * 100

print(f'Training accuracy: {train_acc:.2f}%')
print(f'Validation accuracy: {val_acc:.2f}%')
print(f'Best parameters: {grid_search.best_params_}')
Added a Pipeline combining StandardScaler and SVC classifier.
Defined a parameter grid to tune SVC hyperparameters: C, kernel, and gamma.
Used GridSearchCV with 5-fold cross-validation to find the best hyperparameters.
Evaluated the tuned model on training and validation sets.
Results Interpretation

Before tuning: Training accuracy: 95%, Validation accuracy: 90%

After tuning with Pipeline and GridSearchCV: Training accuracy: 98.33%, Validation accuracy: 96.67%

Using a pipeline with GridSearchCV helps find the best hyperparameters automatically, improving model performance and making the workflow cleaner and more reliable.
Bonus Experiment
Try adding a different classifier like RandomForestClassifier to the pipeline and tune its hyperparameters with GridSearchCV.
💡 Hint
Replace SVC with RandomForestClassifier in the pipeline and define a parameter grid with 'n_estimators' and 'max_depth' to tune.