How to Use Conv2d in PyTorch: Syntax and Example
In PyTorch, use
torch.nn.Conv2d to create a 2D convolutional layer by specifying input channels, output channels, and kernel size. Apply it to input tensors with shape (batch_size, channels, height, width) to extract spatial features.Syntax
The torch.nn.Conv2d constructor requires these main arguments:
- in_channels: Number of input channels (e.g., 3 for RGB images).
- out_channels: Number of filters (output channels) the layer will produce.
- kernel_size: Size of the convolution filter (e.g., 3 for 3x3).
- stride (optional): Step size for sliding the filter (default is 1).
- padding (optional): Number of pixels added around input (default is 0).
After creating the layer, call it like a function on a 4D input tensor with shape (batch_size, in_channels, height, width).
python
conv = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=0) output = conv(input_tensor)
Example
This example creates a Conv2d layer with 3 input channels and 6 output channels using a 3x3 kernel. It applies the layer to a random input tensor shaped like a batch of 1 RGB image of size 32x32. The output shape shows how the convolution changes the spatial dimensions.
python
import torch import torch.nn as nn # Create Conv2d layer conv_layer = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0) # Create a random input tensor: batch size 1, 3 channels, 32x32 image input_tensor = torch.randn(1, 3, 32, 32) # Apply convolution output_tensor = conv_layer(input_tensor) # Print output shape print('Output shape:', output_tensor.shape)
Output
Output shape: torch.Size([1, 6, 30, 30])
Common Pitfalls
Common mistakes when using Conv2d include:
- Not matching
in_channelsto the input tensor's channel size. - Forgetting that input tensors must be 4D:
(batch_size, channels, height, width). - Ignoring how
kernel_size,stride, andpaddingaffect output size. - Using incorrect padding causing output size to shrink unexpectedly.
Always check tensor shapes before and after convolution to avoid shape mismatch errors.
python
import torch import torch.nn as nn # Wrong: input channels mismatch conv_wrong = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3) input_wrong = torch.randn(1, 3, 28, 28) # 3 channels but conv expects 1 try: output_wrong = conv_wrong(input_wrong) except Exception as e: print('Error:', e) # Right: matching input channels conv_right = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3) output_right = conv_right(input_wrong) print('Output shape:', output_right.shape)
Output
Error: Given groups=1, weight of size [4, 1, 3, 3], expected input[1, 3, 28, 28] to have 1 channels, but got 3 channels instead
Output shape: torch.Size([1, 4, 26, 26])
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| in_channels | Number of channels in input image | Required |
| out_channels | Number of filters (output channels) | Required |
| kernel_size | Size of convolution kernel (int or tuple) | Required |
| stride | Step size for sliding kernel | 1 |
| padding | Zero-padding added to both sides | 0 |
| dilation | Spacing between kernel elements | 1 |
| groups | Number of blocked connections | 1 |
| bias | If True, adds a learnable bias | True |
Key Takeaways
Use torch.nn.Conv2d with correct in_channels matching your input tensor channels.
Input to Conv2d must be 4D: (batch_size, channels, height, width).
Kernel size, stride, and padding control output spatial dimensions.
Check tensor shapes before and after convolution to avoid errors.
Conv2d layers extract spatial features by sliding filters over images.