We use ColumnTransformer to apply different data changes to different columns in one step. This helps when data has numbers and words mixed together.
0
0
ColumnTransformer for mixed types in ML Python
Introduction
You have a table with some columns as numbers and others as words.
You want to change numbers by scaling and words by turning them into numbers.
You want to prepare data quickly before teaching a computer to learn.
You want to keep your data changes organized and easy to repeat.
Syntax
ML Python
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder transformer = ColumnTransformer( transformers=[ ('num', StandardScaler(), ['num_column1', 'num_column2']), ('cat', OneHotEncoder(), ['cat_column1', 'cat_column2']) ] )
Each transformer has a name, a method, and the columns it changes.
Transformers run in parallel and combine results automatically.
Examples
Scale 'age' and 'income' columns, and one-hot encode 'city' column.
ML Python
ColumnTransformer(
transformers=[
('scale', StandardScaler(), ['age', 'income']),
('encode', OneHotEncoder(), ['city'])
]
)Scale 'height' and encode 'color' with safe handling of new categories.
ML Python
ColumnTransformer(
transformers=[
('num', StandardScaler(), ['height']),
('cat', OneHotEncoder(handle_unknown='ignore'), ['color'])
]
)Sample Model
This example shows how to use ColumnTransformer to scale numbers and encode categories before training a logistic regression model. It splits data, trains, predicts, and shows accuracy.
ML Python
from sklearn.compose import ColumnTransformer from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.model_selection import train_test_split import numpy as np import pandas as pd # Sample data with mixed types data = pd.DataFrame({ 'age': [25, 32, 47, 51, 62], 'income': [50000, 60000, 80000, 72000, 90000], 'city': ['New York', 'Paris', 'Paris', 'London', 'New York'], 'target': [0, 1, 0, 1, 0] }) X = data.drop('target', axis=1) y = data['target'] # Define ColumnTransformer preprocessor = ColumnTransformer( transformers=[ ('num', StandardScaler(), ['age', 'income']), ('cat', OneHotEncoder(), ['city']) ] ) # Create a pipeline with preprocessing and model model = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', LogisticRegression())]) # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train model model.fit(X_train, y_train) # Predict predictions = model.predict(X_test) # Print results print('Predictions:', predictions) print('Test labels:', y_test.values) print(f'Test accuracy: {model.score(X_test, y_test):.2f}')
OutputSuccess
Important Notes
ColumnTransformer keeps your data changes clear and easy to manage.
Always match column names exactly when specifying columns.
Use pipelines to combine preprocessing and model training smoothly.
Summary
ColumnTransformer lets you change different columns in different ways at once.
It is useful when your data has both numbers and words.
Use it with pipelines to prepare data and train models easily.