What is ColumnTransformer for mixed types in ML Python?

ML Pythonml~5 mins

ColumnTransformer for mixed types in ML Python

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

We use ColumnTransformer to apply different data changes to different columns in one step. This helps when data has numbers and words mixed together.

You have a table with some columns as numbers and others as words.

You want to change numbers by scaling and words by turning them into numbers.

You want to prepare data quickly before teaching a computer to learn.

You want to keep your data changes organized and easy to repeat.

Syntax

ML Python

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

transformer = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), ['num_column1', 'num_column2']),
        ('cat', OneHotEncoder(), ['cat_column1', 'cat_column2'])
    ]
)

Each transformer has a name, a method, and the columns it changes.

Transformers run in parallel and combine results automatically.

Examples

Scale 'age' and 'income' columns, and one-hot encode 'city' column.

ML Python

ColumnTransformer(
    transformers=[
        ('scale', StandardScaler(), ['age', 'income']),
        ('encode', OneHotEncoder(), ['city'])
    ]
)

Scale 'height' and encode 'color' with safe handling of new categories.

ML Python

ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), ['height']),
        ('cat', OneHotEncoder(handle_unknown='ignore'), ['color'])
    ]
)

Sample Model

This example shows how to use ColumnTransformer to scale numbers and encode categories before training a logistic regression model. It splits data, trains, predicts, and shows accuracy.

ML Python

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd

# Sample data with mixed types
data = pd.DataFrame({
    'age': [25, 32, 47, 51, 62],
    'income': [50000, 60000, 80000, 72000, 90000],
    'city': ['New York', 'Paris', 'Paris', 'London', 'New York'],
    'target': [0, 1, 0, 1, 0]
})

X = data.drop('target', axis=1)
y = data['target']

# Define ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), ['age', 'income']),
        ('cat', OneHotEncoder(), ['city'])
    ]
)

# Create a pipeline with preprocessing and model
model = Pipeline(steps=[('preprocessor', preprocessor),
                        ('classifier', LogisticRegression())])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train model
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Print results
print('Predictions:', predictions)
print('Test labels:', y_test.values)
print(f'Test accuracy: {model.score(X_test, y_test):.2f}')

OutputSuccess

Important Notes

ColumnTransformer keeps your data changes clear and easy to manage.

Always match column names exactly when specifying columns.

Use pipelines to combine preprocessing and model training smoothly.

Summary

ColumnTransformer lets you change different columns in different ways at once.

It is useful when your data has both numbers and words.

Use it with pipelines to prepare data and train models easily.

Practice

(1/5)

1. What is the main purpose of using ColumnTransformer in machine learning?

easy

A. To train multiple models on the same dataset

B. To apply different preprocessing steps to different columns in a dataset

C. To visualize data distributions

D. To split data into training and testing sets

ColumnTransformer for mixed types in ML Python

Start learning this pattern below

Practice

Solution

Step 1: Understand the role of ColumnTransformer

Step 2: Compare with other options

Final Answer:

Quick Check:

Solution

Step 1: Recall the module for ColumnTransformer

Step 2: Verify other options

Final Answer:

Quick Check:

Solution

Step 1: Understand ColumnTransformer setup

Step 2: Predict output structure

Final Answer:

Quick Check:

Solution

Step 1: Check columns assigned to StandardScaler

Step 2: Understand why this causes an error

Final Answer:

Quick Check:

Solution

Step 1: Identify correct transformers for each column type

Step 2: Match columns to transformers correctly

Final Answer:

Quick Check: