Kernel size, stride, and padding control how a filter moves over an image in convolution. They help decide the output size and what parts of the image the model looks at.
Kernel size, stride, padding in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
kernel_size is the size of the filter (e.g., 3 means 3x3).
stride is how many pixels the filter moves each step (default is 1).
padding adds pixels around the input edges (default is 0).
conv = torch.nn.Conv2d(1, 10, kernel_size=3)
conv = torch.nn.Conv2d(1, 10, kernel_size=5, stride=2)
conv = torch.nn.Conv2d(1, 10, kernel_size=3, padding=1)
This code shows how kernel size, stride, and padding affect output size and values. We use a simple 5x5 input and set all weights to 1 for clarity.
import torch import torch.nn as nn # Create a sample input: batch size 1, 1 channel, 5x5 image input_tensor = torch.arange(25, dtype=torch.float32).reshape(1, 1, 5, 5) # Define conv layers with different kernel_size, stride, padding conv1 = nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=0) conv2 = nn.Conv2d(1, 1, kernel_size=3, stride=2, padding=0) conv3 = nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1) # Initialize weights and bias to 1 for easy understanding for conv in [conv1, conv2, conv3]: nn.init.constant_(conv.weight, 1.0) nn.init.constant_(conv.bias, 0.0) # Apply convolutions output1 = conv1(input_tensor) output2 = conv2(input_tensor) output3 = conv3(input_tensor) # Print shapes and outputs print(f"Output1 shape: {output1.shape}") print(output1) print(f"Output2 shape: {output2.shape}") print(output2) print(f"Output3 shape: {output3.shape}") print(output3)
Padding helps keep the output size the same as input when using stride 1.
Stride greater than 1 reduces output size by skipping positions.
Kernel size controls the area the filter looks at each step.
Kernel size is the filter size that scans the input.
Stride controls how far the filter moves each step.
Padding adds pixels around input to control output size and edge info.
Practice
stride parameter control in a convolutional layer in PyTorch?Solution
Step 1: Understand stride in convolution
Stride defines the step size the filter moves when scanning the input image or feature map.Step 2: Differentiate stride from other parameters
Kernel size is the filter size, padding adds pixels around input, and number of filters controls output depth.Final Answer:
How far the filter moves on the input each step -> Option AQuick Check:
Stride = step size of filter movement [OK]
- Confusing stride with kernel size
- Thinking stride controls padding
- Mixing stride with number of filters
Solution
Step 1: Check PyTorch Conv2d parameter names
Correct parameters are in_channels, out_channels, kernel_size, stride, and padding.Step 2: Match values to question
Kernel size=3, stride=2, padding=1 matches only nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=2, padding=1) exactly.Final Answer:
nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=2, padding=1) -> Option AQuick Check:
Correct parameter names and values used [OK]
- Using wrong parameter names like kernel or pad
- Mixing stride and padding values
- Wrong kernel size or stride values
Solution
Step 1: Use output size formula for Conv2d
Output size = floor((Input + 2*padding - kernel_size)/stride) + 1Step 2: Calculate output height and width
Input=7, padding=1, kernel=3, stride=2
Output = floor((7 + 2*1 - 3)/2) + 1 = floor((7 + 2 - 3)/2) + 1 = floor(6/2) + 1 = 3 + 1 = 4Final Answer:
(1, 1, 4, 4) -> Option DQuick Check:
Output size formula applied correctly [OK]
- Forgetting to add padding twice
- Using ceil instead of floor
- Mixing stride and kernel size in formula
nn.Conv2d(1, 10, kernel_size=3, stride=2, padding=0)on input size (1, 1, 1, 1). What is the likely cause?
Solution
Step 1: Check output size with given parameters
Output size = floor((1 + 2*0 - 3)/2) + 1 = floor((-2)/2) + 1 = floor(-1) + 1 = -1 + 1 = 0 (invalid)Step 2: Consider if padding causes error
Padding=0 on small input (1x1) causes the calculated output size to be zero, which PyTorch raises as a runtime error due to insufficient padding for the kernel size.Final Answer:
Padding is too small for the input size causing negative output dimension -> Option BQuick Check:
Padding too small -> invalid output size [OK]
- Assuming stride must be 1 for kernel 3
- Thinking kernel size must be even
- Swapping input and output channels
Solution
Step 1: Use formula for output size with stride=1
Output size = Input size if padding = (kernel_size - 1) / 2Step 2: Calculate padding
Padding = (5 - 1) / 2 = 4 / 2 = 2Final Answer:
2 -> Option CQuick Check:
Padding = (kernel_size - 1)/2 keeps size same [OK]
- Using padding 0 or 1 incorrectly
- Forgetting stride must be 1 for same size
- Using padding larger than needed
