Practice

(1/5)

1. What is the main purpose of a feature engineering pipeline in MLOps?

easy

A. To automate and standardize data preparation steps

B. To deploy machine learning models to production

C. To monitor model performance after deployment

D. To collect raw data from external sources

Solution

Step 1: Understand the role of feature engineering pipelines
Feature engineering pipelines automate the process of transforming raw data into features for model training and testing.
Step 2: Differentiate from other MLOps tasks
Deploying models, monitoring, and data collection are separate tasks from feature engineering pipelines.
Final Answer:
To automate and standardize data preparation steps -> Option A
Quick Check:
Feature engineering pipeline = automate data prep [OK]

Hint: Feature pipelines automate data prep, not deployment or monitoring [OK]

Common Mistakes:

Confusing feature pipelines with model deployment
Thinking pipelines collect raw data
Mixing up monitoring with feature engineering

2. Which of the following is the correct way to define a simple feature engineering pipeline step using scikit-learn's Pipeline?

easy

A. pipeline = Pipeline([('scaler', StandardScaler()), ('pca', PCA(n_components=2))])

B. pipeline = Pipeline('scaler', StandardScaler(), 'pca', PCA(n_components=2))

C. pipeline = Pipeline({'scaler': StandardScaler(), 'pca': PCA(n_components=2)})

D. pipeline = Pipeline(StandardScaler(), PCA(n_components=2))

Solution

Step 1: Recall scikit-learn Pipeline syntax
Pipeline expects a list of tuples, each tuple with a name and a transformer object.
Step 2: Check each option's syntax
pipeline = Pipeline([('scaler', StandardScaler()), ('pca', PCA(n_components=2))]) correctly uses a list of tuples. Options B, C, and D use incorrect argument formats.
Final Answer:
pipeline = Pipeline([('scaler', StandardScaler()), ('pca', PCA(n_components=2))]) -> Option A
Quick Check:
Pipeline needs list of (name, transformer) tuples [OK]

Hint: Pipeline needs list of (name, transformer) tuples [OK]

Common Mistakes:

Passing arguments without list brackets
Using dict instead of list of tuples
Omitting step names in pipeline

3. Given the following pipeline code, what will be the output of pipeline.transform([[0, 0], [1, 1]])?

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

pipeline = Pipeline([
  ('scaler', StandardScaler()),
  ('pca', PCA(n_components=1))
])
pipeline.fit([[0, 0], [1, 1]])
result = pipeline.transform([[0, 0], [1, 1]])
print(result)

medium

A. Error: PCA requires more than one sample

B. [[0. 0.] [1. 1.]]

C. [[0.5] [0.5]]

D. [[-1.41421356] [ 1.41421356]]

Solution

Step 1: Understand pipeline steps
First, data is scaled to zero mean and unit variance, then PCA reduces to 1 component.
Step 2: Calculate transformed output
Scaling [[0,0],[1,1]] centers data, PCA finds principal component; output is approximately [[-1.41421356],[1.41421356]].
Final Answer:
[[-1.41421356] [ 1.41421356]] -> Option D
Quick Check:
Scaling + PCA output = [[-1.41421356] [ 1.41421356]] [OK]

Hint: Scaling centers data; PCA output is principal component values [OK]

Common Mistakes:

Expecting original data as output
Confusing PCA output shape
Assuming error due to small data

4. You have this pipeline code but it raises an error: ValueError: Expected 2D array, got 1D array instead. What is the likely cause?

pipeline = Pipeline([
  ('scaler', StandardScaler()),
  ('pca', PCA(n_components=1))
])
pipeline.fit([1, 2, 3, 4])

medium

A. Pipeline steps must be functions, not classes

B. Input to fit should be 2D array, not 1D list

C. StandardScaler requires integer inputs only

D. PCA cannot have n_components=1

Solution

Step 1: Analyze error message
The error says input is 1D but 2D is expected for fit method.
Step 2: Check input format
Input [1, 2, 3, 4] is a 1D list; fit expects 2D array like [[1], [2], [3], [4]].
Final Answer:
Input to fit should be 2D array, not 1D list -> Option B
Quick Check:
fit input shape must be 2D [OK]

Hint: fit() needs 2D array shape, not flat list [OK]

Common Mistakes:

Passing 1D list instead of 2D array
Misunderstanding PCA parameter limits
Thinking StandardScaler restricts input types

5. You want to create a feature engineering pipeline that handles missing values by filling them with the median, then scales features, and finally selects the top 3 features using a model-based selector. Which pipeline setup is correct?

hard

A. Pipeline([('scaler', StandardScaler()), ('imputer', SimpleImputer(strategy='median')), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))])

B. Pipeline([('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3)), ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])

C. Pipeline([('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))])

D. Pipeline([('imputer', SimpleImputer(strategy='mean')), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3)), ('scaler', StandardScaler())])

Solution

Step 1: Order pipeline steps logically
Missing values must be handled first, then scaling, then feature selection.
Step 2: Check each option's correctness
Pipeline([('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))]) follows correct order and uses median for imputation. Pipeline([('scaler', StandardScaler()), ('imputer', SimpleImputer(strategy='median')), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))]) swaps imputer and scaler incorrectly. Pipeline([('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3)), ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]) starts with selector which needs complete data. Pipeline([('imputer', SimpleImputer(strategy='mean')), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3)), ('scaler', StandardScaler())]) uses mean instead of median and wrong order.
Final Answer:
Pipeline([('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))]) -> Option C
Quick Check:
Impute -> scale -> select features [OK]

Hint: Impute missing -> scale -> select features in pipeline order [OK]

Common Mistakes:

Placing scaler before imputer
Selecting features before imputing missing values
Using mean instead of median when median is required

Input Size (n)	Approx. Operations
10	About 10 sets of transformations and combinations
100	About 100 sets of transformations and combinations
1000	About 1000 sets of transformations and combinations

Feature engineering pipelines in MLOps - Time & Space Complexity

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of feature engineering pipelines

Step 2: Differentiate from other MLOps tasks

Final Answer:

Quick Check:

Solution

Step 1: Recall scikit-learn Pipeline syntax

Step 2: Check each option's syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand pipeline steps

Step 2: Calculate transformed output

Final Answer:

Quick Check:

Solution

Step 1: Analyze error message

Step 2: Check input format

Final Answer:

Quick Check:

Solution

Step 1: Order pipeline steps logically

Step 2: Check each option's correctness

Final Answer:

Quick Check: