0
0
ML Pythonml~12 mins

ColumnTransformer for mixed types in ML Python - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - ColumnTransformer for mixed types

This pipeline shows how to handle different types of data columns separately using ColumnTransformer. Numeric columns are scaled, and categorical columns are converted to numbers. This helps the model learn better from mixed data.

Data Flow - 4 Stages
1Raw Data Input
1000 rows x 5 columnsOriginal dataset with 3 numeric and 2 categorical columns1000 rows x 5 columns
[{'age': 25, 'income': 50000, 'score': 0.8, 'gender': 'M', 'city': 'NY'}, ...]
2Preprocessing with ColumnTransformer
1000 rows x 5 columnsScale numeric columns and one-hot encode categorical columns1000 rows x 7 columns
[[0.1, -0.2, 0.5, 1, 0, 0, 1], ...] # scaled numeric + encoded categorical
3Train/Test Split
1000 rows x 7 columnsSplit data into 800 training and 200 testing rowsTrain: 800 rows x 7 columns, Test: 200 rows x 7 columns
Train features shape: (800, 7), Test features shape: (200, 7)
4Model Training
800 rows x 7 columnsTrain logistic regression model on processed dataTrained model
Model learns weights for 7 input features
Training Trace - Epoch by Epoch
Loss
0.65 |*****
0.50 |****
0.40 |***
0.35 |**
0.30 |*
Epochs -> 1 2 3 4 5
EpochLoss ↓Accuracy ↑Observation
10.650.60Model starts with moderate loss and accuracy
20.500.75Loss decreases and accuracy improves
30.400.82Model continues to learn patterns
40.350.85Loss decreases steadily, accuracy rises
50.300.88Training converges with good accuracy
Prediction Trace - 4 Layers
Layer 1: Input sample
Layer 2: ColumnTransformer preprocessing
Layer 3: Model prediction (logistic regression)
Layer 4: Threshold decision
Model Quiz - 3 Questions
Test your understanding
What does the ColumnTransformer do to categorical columns?
AConverts them into numbers using one-hot encoding
BScales them between 0 and 1
CDrops them from the dataset
DLeaves them unchanged
Key Insight
Using ColumnTransformer allows the model to handle numeric and categorical data properly by applying the right transformations to each type. This improves learning and prediction accuracy.