Challenge - 5 Problems
Pipeline Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of a simple scikit-learn Pipeline
What is the output of the following code snippet when predicting with the pipeline?
ML Python
from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression import numpy as np pipeline = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression(random_state=0)) ]) X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5]]) y_train = np.array([0, 0, 1, 1]) pipeline.fit(X_train, y_train) X_test = np.array([[1.5, 2.5]]) prediction = pipeline.predict(X_test) print(prediction)
Attempts:
2 left
💡 Hint
Think about how the pipeline transforms data before prediction.
✗ Incorrect
The pipeline first scales the input using StandardScaler, then applies LogisticRegression. Given the training data, the model predicts class 1 for the test input.
❓ Model Choice
intermediate2:00remaining
Choosing the correct pipeline step for text data
You want to build a pipeline to classify text documents. Which step should you include before the classifier to convert text into numbers?
Attempts:
2 left
💡 Hint
Text data needs to be converted into numeric features before classification.
✗ Incorrect
CountVectorizer converts text documents into a matrix of token counts, which can be used by classifiers.
❓ Hyperparameter
advanced2:00remaining
Setting hyperparameters in a pipeline
Given this pipeline:
pipeline = Pipeline([
('scaler', StandardScaler()),
('clf', LogisticRegression())
])
How do you set the LogisticRegression parameter 'C' to 0.5 when using GridSearchCV?
Attempts:
2 left
💡 Hint
Use double underscores to access parameters of steps inside the pipeline.
✗ Incorrect
Parameters of steps in a pipeline are accessed by 'stepname__parameter'. Here, 'clf' is the step name.
❓ Metrics
advanced2:00remaining
Evaluating pipeline performance with cross-validation
You run cross_val_score on a pipeline with a classifier and get these scores: [0.8, 0.85, 0.78, 0.82, 0.81]. What is the mean accuracy?
Attempts:
2 left
💡 Hint
Add all scores and divide by the number of scores.
✗ Incorrect
Mean accuracy = (0.8 + 0.85 + 0.78 + 0.82 + 0.81) / 5 = 0.812, rounded to 0.81.
🔧 Debug
expert2:00remaining
Identifying error in pipeline usage
What error does this code raise?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline([
('scaler', StandardScaler()),
('clf', LogisticRegression())
])
X_test = [[1, 2], [3, 4]]
prediction = pipeline.predict(X_test)
Attempts:
2 left
💡 Hint
You must train the pipeline before predicting.
✗ Incorrect
The pipeline is not fitted before calling predict, so it raises a ValueError.