PyTorchml~12 mins

Linear (fully connected) layers in PyTorch - Model Pipeline Trace

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Model Pipeline - Linear (fully connected) layers

This pipeline shows how data moves through a simple neural network with linear layers. Linear layers connect every input to every output, like a full team passing a ball to every player.

Data Flow - 5 Stages

1Input Data

1000 rows x 10 columns→Raw input features representing 10 measurements per example→1000 rows x 10 columns

[[0.5, 1.2, -0.3, ..., 0.7], [1.0, -0.5, 0.0, ..., 1.1], ...]

↓

2Linear Layer 1

1000 rows x 10 columns→Multiply inputs by weights matrix (10x5) and add bias (5)→1000 rows x 5 columns

[[0.1, -0.2, 0.3, 0.0, 0.5], [0.4, 0.1, -0.1, 0.2, 0.3], ...]

↓

3Activation (ReLU)

1000 rows x 5 columns→Apply ReLU to keep positive values, zero out negatives→1000 rows x 5 columns

[[0.1, 0.0, 0.3, 0.0, 0.5], [0.4, 0.1, 0.0, 0.2, 0.3], ...]

↓

4Linear Layer 2

1000 rows x 5 columns→Multiply by weights matrix (5x3) and add bias (3)→1000 rows x 3 columns

[[0.2, -0.1, 0.4], [0.5, 0.0, 0.1], ...]

↓

5Output

1000 rows x 3 columns→Final scores for 3 classes→1000 rows x 3 columns

[[2.1, 0.5, -1.0], [1.5, 0.7, 0.3], ...]

Training Trace - Epoch by Epoch

Loss
1.2 |*       
0.9 | *      
0.7 |  *     
0.55|   *    
0.45|    *   
    +---------
     1 2 3 4 5 Epochs

Epoch	Loss ↓	Accuracy ↑	Observation
1	1.2	0.45	Loss starts high, accuracy low as model begins learning
2	0.9	0.60	Loss decreases, accuracy improves as weights adjust
3	0.7	0.72	Model learns better patterns, accuracy rises
4	0.55	0.80	Loss continues to drop, accuracy nearing good performance
5	0.45	0.85	Training converges with lower loss and higher accuracy

Prediction Trace - 5 Layers

Layer 1: Input Sample

Layer 2: Linear Layer 1

Layer 3: ReLU Activation

Layer 4: Linear Layer 2

Layer 5: Output Scores

Model Quiz - 3 Questions

Test your understanding

What does the first linear layer do to the input data?

AIt normalizes the input data

BIt multiplies inputs by weights and adds bias to reduce dimensions

CIt applies a non-linear activation function

DIt splits data into training and testing sets

Key Insight

Linear layers connect every input to every output, allowing the model to learn weighted sums of features. Adding activation functions like ReLU introduces non-linearity, helping the model learn complex patterns. Training shows loss going down and accuracy going up, meaning the model is learning well.