Challenge - 5 Problems
Pipeline Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of GridSearchCV best parameters in pipeline
What will be the output of the following code snippet after fitting the GridSearchCV pipeline?
ML Python
from sklearn.datasets import load_iris from sklearn.model_selection import GridSearchCV from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC iris = load_iris() X, y = iris.data, iris.target pipe = Pipeline([ ('scaler', StandardScaler()), ('svc', SVC()) ]) param_grid = { 'svc__C': [0.1, 1], 'svc__kernel': ['linear', 'rbf'] } gs = GridSearchCV(pipe, param_grid, cv=3) gs.fit(X, y) print(gs.best_params_)
Attempts:
2 left
💡 Hint
Think about which kernel usually performs better on iris dataset with default settings.
✗ Incorrect
The RBF kernel with C=1 generally performs better on the iris dataset than linear kernel or smaller C values, so GridSearchCV selects {'svc__C': 1, 'svc__kernel': 'rbf'}.
❓ Model Choice
intermediate1:30remaining
Choosing the correct pipeline step for text data
You want to build a pipeline for text classification using GridSearchCV. Which pipeline step should you include before the classifier to convert text into numbers?
Attempts:
2 left
💡 Hint
Text data needs to be converted into numeric features before classification.
✗ Incorrect
CountVectorizer converts text into a matrix of token counts, which is suitable for text classification pipelines.
❓ Hyperparameter
advanced1:30remaining
Correct hyperparameter name in pipeline for GridSearchCV
Given a pipeline named 'pipe' with steps [('scaler', StandardScaler()), ('clf', RandomForestClassifier())], which is the correct hyperparameter name to tune the number of trees in GridSearchCV?
Attempts:
2 left
💡 Hint
In pipelines, hyperparameters are prefixed by the step name and two underscores.
✗ Incorrect
To tune hyperparameters of a step in a pipeline, use 'stepname__parameter'. Here, 'clf' is the step name, so 'clf__n_estimators' is correct.
❓ Metrics
advanced1:30remaining
Evaluating GridSearchCV best score attribute
After fitting GridSearchCV with cv=5, what does the attribute 'best_score_' represent?
Attempts:
2 left
💡 Hint
GridSearchCV uses cross-validation to estimate performance.
✗ Incorrect
'best_score_' is the average cross-validation score for the best parameter set found during GridSearchCV.
🔧 Debug
expert2:00remaining
Identifying error in pipeline parameter grid
What error will the following GridSearchCV code raise?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
pipe = Pipeline([
('scaler', StandardScaler()),
('logreg', LogisticRegression())
])
param_grid = {
'C': [0.1, 1, 10]
}
gs = GridSearchCV(pipe, param_grid)
gs.fit(X, y)
Attempts:
2 left
💡 Hint
In pipelines, hyperparameters must be prefixed by the step name.
✗ Incorrect
The param_grid key 'C' is invalid because it lacks the step prefix 'logreg__'. This causes a KeyError during GridSearchCV.