MLOpsdevops~30 mins

Feature engineering pipelines in MLOps - Mini Project: Build & Apply

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Feature Engineering Pipelines

📖 Scenario: You are working on a machine learning project. You need to prepare your data by creating a feature engineering pipeline. This pipeline will help you clean and transform your data automatically before training your model.

🎯 Goal: Build a simple feature engineering pipeline using scikit-learn that scales numeric features and encodes categorical features.

📋 What You'll Learn

Create a dataset dictionary with numeric and categorical features

Define a configuration variable for numeric feature names

Build a pipeline that scales numeric features and encodes categorical features

Print the transformed feature array

💡 Why This Matters

🌍 Real World

Feature engineering pipelines automate data preparation steps, making machine learning workflows faster and less error-prone.

💼 Career

Understanding how to build and use feature engineering pipelines is essential for MLOps engineers and data scientists to deploy reliable ML models.

Progress0 / 4 steps

Create the initial dataset dictionary

Create a dictionary called data with these exact entries: 'age': [25, 32, 47], 'salary': [50000, 60000, 80000], and 'department': ['sales', 'engineering', 'marketing'].

MLOps

# Your code here

Hint

Use a dictionary with keys 'age', 'salary', and 'department' and assign lists of values exactly as shown.

Define numeric feature names

Create a list called numeric_features containing the strings 'age' and 'salary'.

MLOps

data = {
    'age': [25, 32, 47],
    'salary': [50000, 60000, 80000],
    'department': ['sales', 'engineering', 'marketing']
}
# Your code here

Hint

Define a list named numeric_features with the exact two strings.

Build the feature engineering pipeline

Import ColumnTransformer, StandardScaler, and OneHotEncoder from sklearn. Create a ColumnTransformer called preprocessor that applies StandardScaler() to numeric_features and OneHotEncoder() to the 'department' feature.

MLOps

data = {
    'age': [25, 32, 47],
    'salary': [50000, 60000, 80000],
    'department': ['sales', 'engineering', 'marketing']
}
numeric_features = ['age', 'salary']
# Your code here

Hint

Use ColumnTransformer with two transformers: one for numeric features using StandardScaler() and one for categorical feature 'department' using OneHotEncoder().

Transform and print the processed features

Convert data to a pandas DataFrame called df. Use preprocessor.fit_transform(df) to transform the data and assign it to features. Print features.

MLOps

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import pandas as pd

data = {
    'age': [25, 32, 47],
    'salary': [50000, 60000, 80000],
    'department': ['sales', 'engineering', 'marketing']
}
numeric_features = ['age', 'salary']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(), ['department'])
    ]
)
# Your code here

Hint

Use pandas.DataFrame to convert data. Then call preprocessor.fit_transform(df) and print the result.

Practice

(1/5)

1. What is the main purpose of a feature engineering pipeline in MLOps?

easy

A. To automate and standardize data preparation steps

B. To deploy machine learning models to production

C. To monitor model performance after deployment

D. To collect raw data from external sources

5. You want to create a feature engineering pipeline that handles missing values by filling them with the median, then scales features, and finally selects the top 3 features using a model-based selector. Which pipeline setup is correct?

hard

A. Pipeline([('scaler', StandardScaler()), ('imputer', SimpleImputer(strategy='median')), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))])

B. Pipeline([('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3)), ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])

C. Pipeline([('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler()), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3))])

D. Pipeline([('imputer', SimpleImputer(strategy='mean')), ('selector', SelectFromModel(estimator=RandomForestClassifier(), max_features=3)), ('scaler', StandardScaler())])

Feature engineering pipelines in MLOps - Mini Project: Build & Apply

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of feature engineering pipelines

Step 2: Differentiate from other MLOps tasks

Final Answer:

Quick Check:

Solution

Step 1: Recall scikit-learn Pipeline syntax

Step 2: Check each option's syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand pipeline steps

Step 2: Calculate transformed output

Final Answer:

Quick Check:

Solution

Step 1: Analyze error message

Step 2: Check input format

Final Answer:

Quick Check:

Solution

Step 1: Order pipeline steps logically

Step 2: Check each option's correctness

Final Answer:

Quick Check: