0
0
MLOpsdevops~20 mins

Feature engineering pipelines in MLOps - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Feature Engineering Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the main purpose of a feature engineering pipeline in MLOps?

Choose the best description of why we use feature engineering pipelines in machine learning operations.

ATo collect raw data from various sources.
BTo deploy machine learning models to production environments.
CTo automate and standardize the process of transforming raw data into features for models.
DTo monitor the performance of models after deployment.
Attempts:
2 left
💡 Hint

Think about what happens before training a model with raw data.

💻 Command Output
intermediate
2:00remaining
Output of a feature pipeline step using scikit-learn's ColumnTransformer

Given the following Python code snippet, what is the output shape of X_transformed?

MLOps
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import numpy as np

X = np.array([[25, 'red'], [30, 'blue'], [22, 'green']])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), [0]),
        ('cat', OneHotEncoder(), [1])
    ])

X_transformed = preprocessor.fit_transform(X)
print(X_transformed.shape)
A(3, 3)
B(3, 4)
C(2, 4)
D(3, 2)
Attempts:
2 left
💡 Hint

Count numeric and categorical features after transformation.

🔀 Workflow
advanced
2:30remaining
Order the steps to build a feature engineering pipeline for a new dataset

Arrange the following steps in the correct order to create a feature engineering pipeline.

A3,1,2,4
B2,1,3,4
C1,3,2,4
D1,2,3,4
Attempts:
2 left
💡 Hint

Think about understanding data first, then defining transformations, then implementation, then testing.

Troubleshoot
advanced
2:30remaining
Why does this feature pipeline raise a ValueError during fit?

Consider this code snippet that raises an error during fit. What is the most likely cause?

MLOps
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
import numpy as np

X = np.array([[1, 2], [np.nan, 3], [7, 6]])

pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

pipeline.fit(X)
APipeline steps are in wrong order; imputer should come before scaler.
BSimpleImputer requires categorical data, but numeric data was given.
CStandardScaler cannot handle NaN values before imputation.
DThe input array X has inconsistent row lengths.
Attempts:
2 left
💡 Hint

Think about which step should handle missing values first.

Best Practice
expert
3:00remaining
Which practice ensures feature engineering pipelines support model reproducibility?

Choose the best practice that helps maintain reproducibility of machine learning models when using feature engineering pipelines.

AVersion control the pipeline code and store pipeline artifacts with the model.
BRun the pipeline only on training data and ignore test data transformations.
CManually apply transformations outside the pipeline for flexibility.
DUse random transformations without fixing seeds to increase data variety.
Attempts:
2 left
💡 Hint

Think about how to keep track of changes and ensure the same transformations are applied later.