How to Use nn.Conv2d in PyTorch: Syntax and Example
In PyTorch,
nn.Conv2d creates a 2D convolutional layer that processes image-like data. You define it by specifying input channels, output channels, kernel size, and optional parameters like stride and padding. Then, pass your input tensor through this layer to get the convolved output.Syntax
The nn.Conv2d layer is initialized with key parameters:
- in_channels: Number of input channels (e.g., 3 for RGB images).
- out_channels: Number of filters to apply, which determines output channels.
- kernel_size: Size of the filter window (e.g., 3 for a 3x3 filter).
- stride: Step size for moving the filter (default is 1).
- padding: Number of pixels added around input edges (default is 0).
These control how the convolution scans the input and produces output.
python
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
Example
This example shows how to create a Conv2d layer and apply it to a random input tensor simulating a batch of 1 RGB image of size 5x5 pixels.
python
import torch import torch.nn as nn # Create a Conv2d layer: 3 input channels, 2 output channels, 3x3 kernel conv_layer = nn.Conv2d(in_channels=3, out_channels=2, kernel_size=3, stride=1, padding=1) # Create a random input tensor: batch size 1, 3 channels, 5x5 image input_tensor = torch.randn(1, 3, 5, 5) # Apply convolution output_tensor = conv_layer(input_tensor) print('Input shape:', input_tensor.shape) print('Output shape:', output_tensor.shape)
Output
Input shape: torch.Size([1, 3, 5, 5])
Output shape: torch.Size([1, 2, 5, 5])
Common Pitfalls
- Wrong input shape: Conv2d expects input as (batch_size, channels, height, width). Passing (batch_size, height, width, channels) will cause errors.
- Ignoring padding: Without padding, output size shrinks. Use padding to keep output size same as input.
- Kernel size too large: Kernel size larger than input spatial dimensions causes errors.
- Forgetting batch dimension: Input must include batch size, even if 1.
python
import torch import torch.nn as nn conv = nn.Conv2d(3, 1, 3) # Wrong input shape (channels last) - will error try: wrong_input = torch.randn(1, 5, 5, 3) conv(wrong_input) except Exception as e: print('Error with wrong input shape:', e) # Correct input shape correct_input = torch.randn(1, 3, 5, 5) output = conv(correct_input) print('Output shape with correct input:', output.shape)
Output
Error with wrong input shape: Expected 4-dimensional input for 4-dimensional weight [1, 3, 3, 3], but got 4-dimensional input of size [1, 5, 5, 3] instead
Output shape with correct input: torch.Size([1, 1, 3, 3])
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| in_channels | Number of input channels | Required |
| out_channels | Number of output channels (filters) | Required |
| kernel_size | Size of convolution kernel (int or tuple) | Required |
| stride | Stride of convolution | 1 |
| padding | Zero-padding added to both sides | 0 |
| dilation | Spacing between kernel elements | 1 |
| groups | Number of blocked connections | 1 |
| bias | If True, adds a learnable bias | True |
| padding_mode | Type of padding: 'zeros', 'reflect', etc. | 'zeros' |
Key Takeaways
Use nn.Conv2d to create 2D convolution layers for image data in PyTorch.
Input tensor must have shape (batch_size, channels, height, width).
Adjust kernel_size, stride, and padding to control output size and features.
Common errors come from wrong input shape or forgetting batch dimension.
Padding keeps output size same as input; without it, output shrinks.