Bird
Raised Fist0
ML Pythonml~20 mins

Pipeline with GridSearchCV in ML Python - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Pipeline Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of GridSearchCV best parameters in pipeline
What will be the output of the following code snippet after fitting the GridSearchCV pipeline?
ML Python
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

iris = load_iris()
X, y = iris.data, iris.target

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svc', SVC())
])

param_grid = {
    'svc__C': [0.1, 1],
    'svc__kernel': ['linear', 'rbf']
}

gs = GridSearchCV(pipe, param_grid, cv=3)
gs.fit(X, y)

print(gs.best_params_)
A{'svc__C': 0.1, 'svc__kernel': 'rbf'}
B{'svc__C': 0.1, 'svc__kernel': 'linear'}
C{'svc__C': 1, 'svc__kernel': 'linear'}
D{'svc__C': 1, 'svc__kernel': 'rbf'}
Attempts:
2 left
💡 Hint
Think about which kernel usually performs better on iris dataset with default settings.
Model Choice
intermediate
1:30remaining
Choosing the correct pipeline step for text data
You want to build a pipeline for text classification using GridSearchCV. Which pipeline step should you include before the classifier to convert text into numbers?
ACountVectorizer()
BStandardScaler()
CPCA()
DMinMaxScaler()
Attempts:
2 left
💡 Hint
Text data needs to be converted into numeric features before classification.
Hyperparameter
advanced
1:30remaining
Correct hyperparameter name in pipeline for GridSearchCV
Given a pipeline named 'pipe' with steps [('scaler', StandardScaler()), ('clf', RandomForestClassifier())], which is the correct hyperparameter name to tune the number of trees in GridSearchCV?
A'n_estimators'
B'pipe__n_estimators'
C'clf__n_estimators'
D'scaler__n_estimators'
Attempts:
2 left
💡 Hint
In pipelines, hyperparameters are prefixed by the step name and two underscores.
Metrics
advanced
1:30remaining
Evaluating GridSearchCV best score attribute
After fitting GridSearchCV with cv=5, what does the attribute 'best_score_' represent?
AThe training score of the best estimator on the full training data
BThe mean cross-validation score of the best estimator across the 5 folds
CThe test score on unseen data
DThe highest score from a single fold during cross-validation
Attempts:
2 left
💡 Hint
GridSearchCV uses cross-validation to estimate performance.
🔧 Debug
expert
2:00remaining
Identifying error in pipeline parameter grid
What error will the following GridSearchCV code raise? from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV pipe = Pipeline([ ('scaler', StandardScaler()), ('logreg', LogisticRegression()) ]) param_grid = { 'C': [0.1, 1, 10] } gs = GridSearchCV(pipe, param_grid) gs.fit(X, y)
AKeyError because 'C' is not a valid parameter name for the pipeline
BTypeError because LogisticRegression() is missing required arguments
CValueError because param_grid is empty
DNo error, code runs successfully
Attempts:
2 left
💡 Hint
In pipelines, hyperparameters must be prefixed by the step name.

Practice

(1/5)
1. What is the main purpose of using a Pipeline in machine learning?
easy
A. To combine preprocessing steps and model training into one object
B. To speed up the training by using multiple CPUs
C. To automatically select the best model type
D. To visualize the model's decision boundaries

Solution

  1. Step 1: Understand what a Pipeline does

    A Pipeline chains preprocessing and model training steps so they run together smoothly.
  2. Step 2: Identify the main benefit

    This chaining helps avoid mistakes and makes code cleaner by combining steps into one object.
  3. Final Answer:

    To combine preprocessing steps and model training into one object -> Option A
  4. Quick Check:

    Pipeline = combine steps [OK]
Hint: Pipeline bundles steps to simplify workflow [OK]
Common Mistakes:
  • Thinking Pipeline speeds up training automatically
  • Confusing Pipeline with model selection
  • Believing Pipeline creates visualizations
2. Which syntax correctly sets the parameter n_estimators of a RandomForest inside a pipeline named pipe for GridSearchCV?
easy
A. {'randomforest-n_estimators': [10, 50, 100]}
B. {'random_forest__n_estimators': [10, 50, 100]}
C. {'randomforest.n_estimators': [10, 50, 100]}
D. {'randomforest__n_estimators': [10, 50, 100]}

Solution

  1. Step 1: Recall parameter naming in Pipeline

    Parameters inside a pipeline step use double underscores: stepname__paramname.
  2. Step 2: Match step name and parameter

    If the step is named 'randomforest', then 'randomforest__n_estimators' is correct syntax.
  3. Final Answer:

    {'randomforest__n_estimators': [10, 50, 100]} -> Option D
  4. Quick Check:

    Use double underscores between step and param [OK]
Hint: Use double underscores between step and parameter [OK]
Common Mistakes:
  • Using single underscore instead of double
  • Using dot or dash instead of double underscore
  • Misspelling the pipeline step name
3. Given the code below, what will grid.best_params_ output?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(random_state=42))
])

param_grid = {'clf__n_estimators': [20], 'clf__max_depth': [4]}
grid = GridSearchCV(pipe, param_grid, cv=2)
grid.fit(X_train, y_train)

print(grid.best_params_)
medium
A. SyntaxError due to param_grid keys
B. {'clf__n_estimators': 10, 'clf__max_depth': 2}
C. {'clf__n_estimators': 20, 'clf__max_depth': 4}
D. KeyError because 'clf' is not a pipeline step

Solution

  1. Step 1: Understand pipeline and param_grid

    The pipeline has a step named 'clf' for RandomForestClassifier. The param_grid uses 'clf__' prefix correctly.
  2. Step 2: Determine the output

    Since param_grid specifies only one combination, GridSearchCV will select {'clf__n_estimators': 20, 'clf__max_depth': 4} as the best parameters.
  3. Final Answer:

    {'clf__n_estimators': 20, 'clf__max_depth': 4} -> Option C
  4. Quick Check:

    Best params match the only tested values [OK]
Hint: With single param values, they become best_params_ [OK]
Common Mistakes:
  • Confusing step name 'clf' with 'classifier'
  • Using single underscore in param_grid keys
  • Assuming syntax error without checking keys
4. Identify the error in this pipeline and GridSearchCV setup:
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('model', RandomForestClassifier())
])

param_grid = {'randomforest__n_estimators': [10, 50]}
grid = GridSearchCV(pipe, param_grid)
grid.fit(X_train, y_train)
medium
A. The param_grid key should be 'model__n_estimators', not 'randomforest__n_estimators'
B. RandomForestClassifier cannot be used inside a pipeline
C. StandardScaler should not be the first step
D. GridSearchCV requires cv parameter

Solution

  1. Step 1: Check pipeline step names

    The pipeline step for RandomForestClassifier is named 'model', not 'randomforest'.
  2. Step 2: Match param_grid keys to pipeline steps

    Parameter keys must use the step name 'model' with double underscores, so 'model__n_estimators' is correct.
  3. Final Answer:

    The param_grid key should be 'model__n_estimators', not 'randomforest__n_estimators' -> Option A
  4. Quick Check:

    Param keys must match pipeline step names [OK]
Hint: Param keys must match pipeline step names exactly [OK]
Common Mistakes:
  • Using wrong step name in param_grid keys
  • Thinking RandomForest can't be in pipeline
  • Believing cv is mandatory (it defaults to 5)
5. You want to tune both a scaler and a classifier in a pipeline using GridSearchCV. Which param_grid correctly tests StandardScaler with and without scaling, and RandomForest with 10 or 50 trees?
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(random_state=0))
])

param_grid = ?
hard
A. {'scaler__': [StandardScaler(), None], 'clf__n_estimators': [10, 50]}
B. {'scaler': [StandardScaler(), None], 'clf__n_estimators': [10, 50]}
C. {'scaler': [StandardScaler(), None], 'clf__n_estimators': [10, 50], 'clf__max_depth': [None]}
D. {'scaler__with_mean': [True, False], 'clf__n_estimators': [10, 50]}

Solution

  1. Step 1: Understand how to toggle scaler on/off in pipeline

    To test with and without scaling, replace the scaler step with StandardScaler() or None in param_grid using the step name 'scaler'.
  2. Step 2: Set classifier parameters correctly

    Use 'clf__n_estimators' to test 10 and 50 trees for the RandomForestClassifier step named 'clf'.
  3. Final Answer:

    {'scaler': [StandardScaler(), None], 'clf__n_estimators': [10, 50]} -> Option B
  4. Quick Check:

    Toggle scaler by replacing step, tune clf params with double underscores [OK]
Hint: Toggle steps by replacing with None in param_grid [OK]
Common Mistakes:
  • Trying to set scaler params with double underscores incorrectly
  • Using 'scaler__' key with no param name
  • Not using None to disable a step