Overview - Linear (fully connected) layers

What is it?

A linear layer is a basic building block in neural networks that connects every input to every output with a weight. It multiplies the input by a matrix of weights and adds a bias to produce the output. This layer helps the model learn relationships between input features and output predictions. It is often called a fully connected or dense layer.

Why it matters

Linear layers allow neural networks to learn complex patterns by combining input features in flexible ways. Without them, models would be unable to mix information from different inputs, limiting their ability to solve real-world problems like image recognition or language understanding. They form the foundation for most deep learning models.

Where it fits

Before learning linear layers, you should understand basic matrix multiplication and vectors. After mastering linear layers, you can learn activation functions, convolutional layers, and how to build deeper neural networks.

Mental Model

Core Idea

A linear layer transforms input data by multiplying it with weights and adding biases to create new features that help the model learn.

Think of it like...

Imagine a chef mixing ingredients in fixed amounts to create a new dish flavor. Each ingredient (input) is combined with a specific amount (weight), and a pinch of salt (bias) is added to adjust the taste.

Input Vector (x) ──▶ [Weights Matrix (W)] ──▶ Multiply ──▶ Add Bias (b) ──▶ Output Vector (y)

  x (1×n) × W (n×m) + b (1×m) = y (1×m)

Build-Up - 7 Steps

1

FoundationUnderstanding vectors and matrices

Concept: Learn what vectors and matrices are and how they multiply.

A vector is a list of numbers, like [2, 3]. A matrix is a table of numbers, like [[1, 2], [3, 4]]. Multiplying a vector by a matrix combines the vector's numbers with the matrix's rows to create a new vector. For example, multiplying [2, 3] by [[1, 2], [3, 4]] gives [2*1+3*3, 2*2+3*4] = [11, 16].

Result

You can multiply vectors and matrices to transform data.

Understanding vector-matrix multiplication is key because linear layers use this operation to transform inputs into outputs.

2

FoundationWhat is a linear layer in neural networks

3

IntermediateImplementing linear layers in PyTorch

4

IntermediateRole of bias in linear layers

5

IntermediateBatch inputs and linear layers

6

AdvancedWeight initialization in linear layers

7

ExpertLinear layers in deep networks and bottlenecks

Under the Hood

A linear layer stores a weight matrix and a bias vector as parameters. When input data arrives, it performs a matrix multiplication of the input with the weight matrix, then adds the bias vector. This operation is highly optimized using low-level libraries and runs efficiently on CPUs or GPUs. During training, gradients flow back through this operation to update weights and biases.

Why designed this way?

Linear layers are designed as simple affine transformations because they are mathematically easy to compute and differentiate. This simplicity allows stacking many layers and combining with nonlinearities to approximate complex functions. Alternatives like nonlinear layers alone are harder to optimize and interpret.

Input (x) ──▶ [Weight Matrix (W)] ──▶ Multiply ──▶ + Bias (b) ──▶ Output (y)

Parameters:
 ┌─────────────┐
 │ Weight (W)  │
 │ Bias (b)    │
 └─────────────┘

Forward pass:
 x (batch_size×input_dim) × W (input_dim×output_dim) + b (output_dim) = y (batch_size×output_dim)

Myth Busters - 4 Common Misconceptions

Quick: Does a linear layer alone create nonlinear decision boundaries? Commit to yes or no.

Common Belief:A single linear layer can model any complex pattern by itself.

Tap to reveal reality

Quick: Is bias always necessary in a linear layer? Commit to yes or no.

Common Belief:Bias is optional and usually does not affect model performance.

Tap to reveal reality

Quick: Does increasing the number of outputs in a linear layer always improve accuracy? Commit to yes or no.

Common Belief:More output neurons always mean better model performance.

Tap to reveal reality

Quick: Does PyTorch require manual weight initialization for linear layers? Commit to yes or no.

Common Belief:You must always manually initialize weights in PyTorch linear layers.

Tap to reveal reality

Expert Zone

1

Weight sharing across linear layers can reduce model size but requires careful design to maintain performance.

2

Bias terms can sometimes be merged into weights by augmenting inputs with a constant 1, simplifying computations.

3

Linear layers can be viewed as projections in high-dimensional space, which helps understand their role in feature extraction.

When NOT to use

Linear layers are not suitable alone for tasks requiring nonlinear decision boundaries; use them combined with activation functions or convolutional layers. For sequence data, recurrent or transformer layers are often better alternatives.

Production Patterns

In production, linear layers are often used as final classification or regression layers. They are combined with batch normalization and dropout for better generalization. Weight pruning and quantization are applied to linear layers to optimize model size and speed.

Connections

Matrix multiplication

Linear layers perform matrix multiplication as their core operation.

Understanding matrix multiplication deeply helps grasp how linear layers transform data efficiently.

Affine transformations in geometry

Linear layers implement affine transformations, shifting and scaling input space.

Knowing affine transformations from geometry clarifies how linear layers manipulate input features.

Linear regression

Linear layers generalize linear regression by learning weights and biases for multiple outputs.

Recognizing linear layers as a multi-output linear regression helps connect classical statistics with deep learning.

Common Pitfalls

#1Passing input with wrong shape to linear layer

Wrong approach:layer = nn.Linear(4, 3) input_tensor = torch.randn(2, 5) # wrong input size output = layer(input_tensor)

Correct approach:layer = nn.Linear(4, 3) input_tensor = torch.randn(2, 4) # correct input size output = layer(input_tensor)

Root cause:Mismatch between input feature size and layer's expected input dimension causes runtime errors.

#2Forgetting to flatten input before linear layer for images

Wrong approach:layer = nn.Linear(28*28, 10) input_tensor = torch.randn(32, 1, 28, 28) # batch of images output = layer(input_tensor) # error

Correct approach:layer = nn.Linear(28*28, 10) input_tensor = torch.randn(32, 1, 28, 28) input_flat = input_tensor.view(32, -1) # flatten images output = layer(input_flat)

Root cause:Linear layers expect 2D input (batch_size, features), so multi-dimensional inputs must be flattened first.

#3Disabling bias without reason

Wrong approach:layer = nn.Linear(10, 5, bias=False) # bias disabled without justification

Correct approach:layer = nn.Linear(10, 5) # bias enabled by default

Root cause:Removing bias reduces model flexibility and can hurt learning unless specifically needed.

Key Takeaways

Linear layers transform inputs by multiplying with weights and adding biases to create new features.

They perform simple affine transformations but cannot model nonlinear patterns alone.

PyTorch's nn.Linear handles weights, biases, and computations automatically for ease of use.

Proper input shape and flattening are essential to avoid runtime errors with linear layers.

Combining linear layers with nonlinear activations enables deep networks to learn complex data.