0
0
PyTorchml~12 mins

Weight decay (L2 regularization) in PyTorch - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Weight decay (L2 regularization)

This pipeline trains a simple neural network on a small dataset using weight decay, also called L2 regularization. Weight decay helps the model avoid overfitting by gently pushing weights to stay small.

Data Flow - 6 Stages
1Data in
1000 rows x 10 columnsRaw input features and labels1000 rows x 10 columns
Feature vector: [0.5, 1.2, -0.3, ..., 0.7], Label: 1
2Preprocessing
1000 rows x 10 columnsNormalize features to zero mean and unit variance1000 rows x 10 columns
Normalized feature vector: [0.1, -0.2, 0.0, ..., 0.3]
3Feature Engineering
1000 rows x 10 columnsNo additional features added1000 rows x 10 columns
Same normalized features passed forward
4Model Trains
1000 rows x 10 columnsFeedforward neural network with weight decay applied during optimizer step1000 rows x 2 columns (class scores)
Output logits: [1.2, -0.5]
5Metrics Improve
1000 rows x 2 columnsCalculate loss and accuracy on training dataScalar loss and accuracy values
Loss: 0.45, Accuracy: 0.82
6Prediction
1 row x 10 columnsModel predicts class probabilities using softmax1 row x 2 columns (probabilities)
Predicted probabilities: [0.75, 0.25]
Training Trace - Epoch by Epoch
Loss
1.2 |*       
1.0 | *      
0.8 |  *     
0.6 |   *    
0.4 |    *   
    +---------
     1 2 3 4 5
     Epochs
EpochLoss ↓Accuracy ↑Observation
11.200.50Initial loss is high; accuracy is at chance level.
20.850.65Loss decreases and accuracy improves as model learns.
30.650.75Model continues to improve with weight decay controlling complexity.
40.550.80Loss decreases steadily; accuracy rises.
50.480.83Training converges with good accuracy and controlled loss.
Prediction Trace - 4 Layers
Layer 1: Input Layer
Layer 2: Hidden Layer (ReLU)
Layer 3: Output Layer (Linear)
Layer 4: Softmax
Model Quiz - 3 Questions
Test your understanding
What is the main purpose of weight decay in this training pipeline?
ATo add more layers to the neural network
BTo increase the learning rate during training
CTo keep model weights small and prevent overfitting
DTo normalize the input data
Key Insight
Weight decay (L2 regularization) helps the model keep weights small, which reduces overfitting and leads to smoother training with steadily improving accuracy and decreasing loss.