0
0
PyTorchml~12 mins

Learning rate differential in PyTorch - Model Pipeline Trace

Choose your learning style9 modes available
Model Pipeline - Learning rate differential

This pipeline shows how using different learning rates for different parts of a neural network helps the model learn better. It trains a simple model with two layers, each having its own learning rate.

Data Flow - 6 Stages
1Data in
1000 rows x 10 columnsSynthetic dataset creation with 10 features and binary labels1000 rows x 10 columns
[[0.5, -1.2, 0.3, ..., 0.1], label=1]
2Preprocessing
1000 rows x 10 columnsNormalize features to zero mean and unit variance1000 rows x 10 columns
[[0.0, -0.8, 0.2, ..., 0.05], label=1]
3Feature Engineering
1000 rows x 10 columnsNo additional features added1000 rows x 10 columns
[[0.0, -0.8, 0.2, ..., 0.05], label=1]
4Model Trains
1000 rows x 10 columnsTrain 2-layer neural network with differential learning rates: 0.01 for first layer, 0.001 for second layerModel weights updated
Layer1 weights shape: (5, 10), Layer2 weights shape: (1, 5)
5Metrics Improve
Training epochsLoss decreases and accuracy increases over 10 epochsFinal loss ~0.25, accuracy ~88%
Epoch 10: loss=0.25, accuracy=0.88
6Prediction
1 row x 10 columnsModel predicts probability for binary class1 row x 1 column (probability)
[0.85]
Training Trace - Epoch by Epoch
Loss
1.0 | *
0.9 |  *
0.8 |   *
0.7 |    *
0.6 |     *
0.5 |      *
0.4 |       *
0.3 |        *
0.2 |         *
    +----------------
     1 2 3 4 5 6 7 8 9 10 Epochs
EpochLoss ↓Accuracy ↑Observation
10.850.55High loss and low accuracy at start
20.650.68Loss decreases, accuracy improves
30.500.75Model learns important patterns
40.400.80Steady improvement
50.350.83Learning rate differential helps stabilize training
60.300.85Loss continues to decrease
70.280.86Accuracy improves slowly
80.270.87Model converging
90.260.87Small improvements
100.250.88Training stabilizes with good accuracy
Prediction Trace - 5 Layers
Layer 1: Input Layer
Layer 2: First Layer (learning rate 0.01)
Layer 3: Activation (ReLU)
Layer 4: Second Layer (learning rate 0.001)
Layer 5: Sigmoid Activation
Model Quiz - 3 Questions
Test your understanding
Why use different learning rates for different layers?
ATo make training slower overall
BTo allow faster learning in some layers and stable updates in others
CTo reduce the number of layers
DTo increase the model size
Key Insight
Using different learning rates for different layers helps the model learn faster in some parts while keeping other parts stable. This balance improves training speed and final accuracy.