0
0
ML Pythonml~20 mins

scikit-learn Pipeline in ML Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Pipeline Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of a simple scikit-learn Pipeline
What is the output of the following code snippet when predicting with the pipeline?
ML Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import numpy as np

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression(random_state=0))
])

X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_train = np.array([0, 0, 1, 1])

pipeline.fit(X_train, y_train)

X_test = np.array([[1.5, 2.5]])
prediction = pipeline.predict(X_test)
print(prediction)
A[0 1]
B[0]
C[1]
DRaises a ValueError
Attempts:
2 left
💡 Hint
Think about how the pipeline transforms data before prediction.
Model Choice
intermediate
2:00remaining
Choosing the correct pipeline step for text data
You want to build a pipeline to classify text documents. Which step should you include before the classifier to convert text into numbers?
AKMeans()
BCountVectorizer()
CPCA()
DStandardScaler()
Attempts:
2 left
💡 Hint
Text data needs to be converted into numeric features before classification.
Hyperparameter
advanced
2:00remaining
Setting hyperparameters in a pipeline
Given this pipeline: pipeline = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression()) ]) How do you set the LogisticRegression parameter 'C' to 0.5 when using GridSearchCV?
A{'scaler__C': [0.5]}
B{'C': [0.5]}
C{'pipeline__C': [0.5]}
D{'clf__C': [0.5]}
Attempts:
2 left
💡 Hint
Use double underscores to access parameters of steps inside the pipeline.
Metrics
advanced
2:00remaining
Evaluating pipeline performance with cross-validation
You run cross_val_score on a pipeline with a classifier and get these scores: [0.8, 0.85, 0.78, 0.82, 0.81]. What is the mean accuracy?
A0.81
B0.82
C0.80
D0.83
Attempts:
2 left
💡 Hint
Add all scores and divide by the number of scores.
🔧 Debug
expert
2:00remaining
Identifying error in pipeline usage
What error does this code raise? from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression pipeline = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression()) ]) X_test = [[1, 2], [3, 4]] prediction = pipeline.predict(X_test)
AValueError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.
BAttributeError: 'Pipeline' object has no attribute 'predict'
CTypeError: 'list' object is not callable
DNo error, outputs predictions
Attempts:
2 left
💡 Hint
You must train the pipeline before predicting.