Bird
Raised Fist0
ML Pythonml~10 mins

Pipeline best practices in ML Python - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a simple pipeline that scales data and fits a model.

ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', [1]())
])
Drag options to blanks, or click blank then click option'
AKNeighborsClassifier
BLogisticRegression
CRandomForestClassifier
DSVC
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing a model class that is not imported or not suitable for the pipeline step.
Forgetting to instantiate the model with parentheses.
2fill in blank
medium

Complete the code to split data into training and testing sets before building the pipeline.

ML Python
from sklearn.model_selection import [1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Drag options to blanks, or click blank then click option'
AKFold
Bcross_val_score
CGridSearchCV
Dtrain_test_split
Attempts:
3 left
💡 Hint
Common Mistakes
Using cross-validation functions instead of splitting data.
Not importing the correct function.
3fill in blank
hard

Fix the error in the pipeline code by completing the missing step for feature selection.

ML Python
from sklearn.feature_selection import [1]

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('selector', SelectKBest(k=10)),
    ('classifier', LogisticRegression())
])
Drag options to blanks, or click blank then click option'
AVarianceThreshold
BPCA
CSelectKBest
DRFE
Attempts:
3 left
💡 Hint
Common Mistakes
Using PCA which is a dimensionality reduction technique, not feature selection.
Using classes not imported or incompatible with the pipeline.
4fill in blank
hard

Fill both blanks to create a pipeline that scales data and performs cross-validation scoring.

ML Python
from sklearn.model_selection import [1]
from sklearn.preprocessing import [2]
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('scaler', [2]()),
    ('classifier', LogisticRegression())
])
scores = [1](pipeline, X, y, cv=5)
Drag options to blanks, or click blank then click option'
Across_val_score
Btrain_test_split
CStandardScaler
DMinMaxScaler
Attempts:
3 left
💡 Hint
Common Mistakes
Confusing train_test_split with cross_val_score.
Using MinMaxScaler instead of StandardScaler when standardization is needed.
5fill in blank
hard

Fill all three blanks to create a pipeline that imputes missing values, scales features, and fits a classifier.

ML Python
from sklearn.impute import [1]
from sklearn.preprocessing import [2]
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('imputer', [1]()),
    ('scaler', [2]()),
    ('classifier', LogisticRegression())
])
Drag options to blanks, or click blank then click option'
ASimpleImputer
BStandardScaler
CMinMaxScaler
DKNNImputer
Attempts:
3 left
💡 Hint
Common Mistakes
Using KNNImputer without importing it properly.
Mixing up MinMaxScaler and StandardScaler in this context.

Practice

(1/5)
1. Why is it important to use a pipeline in machine learning projects?
easy
A. It organizes steps clearly and avoids mistakes
B. It makes the model run faster on GPUs
C. It automatically improves model accuracy
D. It replaces the need for data cleaning

Solution

  1. Step 1: Understand the purpose of pipelines

    Pipelines help organize the sequence of data processing and modeling steps clearly.
  2. Step 2: Identify benefits of pipelines

    They reduce human errors and make the process repeatable and easy to follow.
  3. Final Answer:

    It organizes steps clearly and avoids mistakes -> Option A
  4. Quick Check:

    Pipeline purpose = Organize steps [OK]
Hint: Pipelines keep steps tidy and error-free [OK]
Common Mistakes:
  • Thinking pipelines speed up model training
  • Believing pipelines improve accuracy automatically
  • Assuming pipelines replace data cleaning
2. Which of the following is the correct way to create a simple pipeline in scikit-learn?
easy
A. Pipeline('scale', StandardScaler(), 'model', LogisticRegression())
B. Pipeline({'scale': StandardScaler(), 'model': LogisticRegression()})
C. Pipeline([('scale', StandardScaler()), ('model', LogisticRegression())])
D. Pipeline(scale=StandardScaler(), model=LogisticRegression())

Solution

  1. Step 1: Recall scikit-learn pipeline syntax

    It requires a list of tuples with step name and transformer/model.
  2. Step 2: Match syntax to options

    Only Pipeline([('scale', StandardScaler()), ('model', LogisticRegression())]) uses a list of tuples correctly.
  3. Final Answer:

    Pipeline([('scale', StandardScaler()), ('model', LogisticRegression())]) -> Option C
  4. Quick Check:

    Pipeline syntax = list of tuples [OK]
Hint: Use list of (name, step) tuples for pipelines [OK]
Common Mistakes:
  • Using dictionary instead of list of tuples
  • Passing keyword arguments instead of list
  • Passing separate arguments without list
3. Given the code below, what will be the output of print(pipe.named_steps['model'].coef_) after fitting?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
  ('scale', StandardScaler()),
  ('model', LogisticRegression())
])

X = [[1, 2], [2, 3], [3, 4], [4, 5]]
y = [0, 0, 1, 1]
pipe.fit(X, y)
print(pipe.named_steps['model'].coef_)
medium
A. A 2D array with coefficients for each feature
B. An error because 'coef_' is not available
C. A list of predicted labels
D. A scalar value representing accuracy

Solution

  1. Step 1: Understand pipeline fitting

    Pipeline fits scaler then logistic regression on data.
  2. Step 2: Access model coefficients

    After fitting, LogisticRegression has attribute 'coef_' which is a 2D array of feature weights.
  3. Final Answer:

    A 2D array with coefficients for each feature -> Option A
  4. Quick Check:

    Model coef_ = 2D array [OK]
Hint: Model coef_ holds feature weights after fit [OK]
Common Mistakes:
  • Expecting coef_ before fitting
  • Confusing coef_ with predictions
  • Trying to access coef_ on pipeline instead of model
4. What is wrong with this pipeline code snippet?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipe = Pipeline([
  ('scale', StandardScaler()),
  ('model', LogisticRegression())
])

pipe.fit(X, y)
pipe.predict(X_test)

Assuming X, y, and X_test are defined correctly.
medium
A. The pipeline is missing a call to transform before predict
B. The pipeline steps are not in a list
C. The pipeline is missing a final estimator
D. Nothing is wrong; code runs fine

Solution

  1. Step 1: Check pipeline construction

    Pipeline steps are correctly given as a list of tuples with scaler and model.
  2. Step 2: Verify usage of fit and predict

    Calling fit and then predict on pipeline is correct; pipeline applies scaler then model automatically.
  3. Final Answer:

    Nothing is wrong; code runs fine -> Option D
  4. Quick Check:

    Pipeline fit/predict usage = correct [OK]
Hint: Pipeline handles transform internally during predict [OK]
Common Mistakes:
  • Thinking transform must be called separately
  • Passing steps as dict instead of list
  • Missing final estimator in pipeline
5. You want to build a pipeline that scales data, selects the top 3 features, and then fits a logistic regression model. Which pipeline setup is best practice?
hard
A. Pipeline([('model', LogisticRegression()), ('scale', StandardScaler()), ('select', SelectKBest(k=3))])
B. Pipeline([('scale', StandardScaler()), ('select', SelectKBest(k=3)), ('model', LogisticRegression())])
C. Pipeline([('select', SelectKBest(k=3)), ('scale', StandardScaler()), ('model', LogisticRegression())])
D. Pipeline([('scale', StandardScaler()), ('model', LogisticRegression()), ('select', SelectKBest(k=3))])

Solution

  1. Step 1: Determine correct order of steps

    Scaling should happen before feature selection to normalize data for selection.
  2. Step 2: Place model last in pipeline

    The model must be the final step to fit on selected features.
  3. Final Answer:

    Pipeline([('scale', StandardScaler()), ('select', SelectKBest(k=3)), ('model', LogisticRegression())]) -> Option B
  4. Quick Check:

    Order: scale -> select -> model [OK]
Hint: Scale first, then select features, then model [OK]
Common Mistakes:
  • Selecting features before scaling
  • Putting model before preprocessing steps
  • Mixing order of pipeline steps