Challenge - 5 Problems

🎖️

Pipeline Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of a simple scikit-learn Pipeline

What is the output of the following code snippet when predicting with the pipeline?

ML Python

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import numpy as np

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression(random_state=0))
])

X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_train = np.array([0, 0, 1, 1])

pipeline.fit(X_train, y_train)

X_test = np.array([[1.5, 2.5]])
prediction = pipeline.predict(X_test)
print(prediction)

A[0 1]

B[0]

C[1]

DRaises a ValueError

Attempts:

2 left

❓ Model Choice

intermediate

2:00remaining

Choosing the correct pipeline step for text data

You want to build a pipeline to classify text documents. Which step should you include before the classifier to convert text into numbers?

AKMeans()

BCountVectorizer()

CPCA()

DStandardScaler()

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Setting hyperparameters in a pipeline

Given this pipeline: pipeline = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression()) ]) How do you set the LogisticRegression parameter 'C' to 0.5 when using GridSearchCV?

A{'scaler__C': [0.5]}

B{'C': [0.5]}

C{'pipeline__C': [0.5]}

D{'clf__C': [0.5]}

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

Evaluating pipeline performance with cross-validation

You run cross_val_score on a pipeline with a classifier and get these scores: [0.8, 0.85, 0.78, 0.82, 0.81]. What is the mean accuracy?

A0.81

B0.82

C0.80

D0.83

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying error in pipeline usage

What error does this code raise? from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression pipeline = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression()) ]) X_test = [[1, 2], [3, 4]] prediction = pipeline.predict(X_test)

AValueError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

BAttributeError: 'Pipeline' object has no attribute 'predict'

CTypeError: 'list' object is not callable

DNo error, outputs predictions

Attempts:

2 left

Practice

(1/5)

1. What is the main purpose of using a Pipeline in scikit-learn?

easy

A. To manually split data into training and testing sets

B. To chain preprocessing steps and model training into one object

C. To visualize the data distribution

D. To increase the size of the dataset

scikit-learn Pipeline in ML Python - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand what a Pipeline does

Step 2: Identify the main purpose

Final Answer:

Quick Check:

Solution

Step 1: Recall Pipeline syntax

Step 2: Check each option

Final Answer:

Quick Check:

Solution

Step 1: Understand the pipeline steps

Step 2: Predict on test data

Final Answer:

Quick Check:

Solution

Step 1: Check each pipeline step

Step 2: Understand Pipeline requirements

Final Answer:

Quick Check:

Solution

Step 1: Determine correct order of steps

Step 2: Check each option's order

Final Answer:

Quick Check: