0
0
PytorchHow-ToBeginner · 4 min read

How to Use Conv2d in PyTorch: Syntax and Example

In PyTorch, use torch.nn.Conv2d to create a 2D convolutional layer by specifying input channels, output channels, and kernel size. Apply it to input tensors with shape (batch_size, channels, height, width) to extract spatial features.
📐

Syntax

The torch.nn.Conv2d constructor requires these main arguments:

  • in_channels: Number of input channels (e.g., 3 for RGB images).
  • out_channels: Number of filters (output channels) the layer will produce.
  • kernel_size: Size of the convolution filter (e.g., 3 for 3x3).
  • stride (optional): Step size for sliding the filter (default is 1).
  • padding (optional): Number of pixels added around input (default is 0).

After creating the layer, call it like a function on a 4D input tensor with shape (batch_size, in_channels, height, width).

python
conv = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=0)
output = conv(input_tensor)
💻

Example

This example creates a Conv2d layer with 3 input channels and 6 output channels using a 3x3 kernel. It applies the layer to a random input tensor shaped like a batch of 1 RGB image of size 32x32. The output shape shows how the convolution changes the spatial dimensions.

python
import torch
import torch.nn as nn

# Create Conv2d layer
conv_layer = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

# Create a random input tensor: batch size 1, 3 channels, 32x32 image
input_tensor = torch.randn(1, 3, 32, 32)

# Apply convolution
output_tensor = conv_layer(input_tensor)

# Print output shape
print('Output shape:', output_tensor.shape)
Output
Output shape: torch.Size([1, 6, 30, 30])
⚠️

Common Pitfalls

Common mistakes when using Conv2d include:

  • Not matching in_channels to the input tensor's channel size.
  • Forgetting that input tensors must be 4D: (batch_size, channels, height, width).
  • Ignoring how kernel_size, stride, and padding affect output size.
  • Using incorrect padding causing output size to shrink unexpectedly.

Always check tensor shapes before and after convolution to avoid shape mismatch errors.

python
import torch
import torch.nn as nn

# Wrong: input channels mismatch
conv_wrong = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3)
input_wrong = torch.randn(1, 3, 28, 28)  # 3 channels but conv expects 1

try:
    output_wrong = conv_wrong(input_wrong)
except Exception as e:
    print('Error:', e)

# Right: matching input channels
conv_right = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3)
output_right = conv_right(input_wrong)
print('Output shape:', output_right.shape)
Output
Error: Given groups=1, weight of size [4, 1, 3, 3], expected input[1, 3, 28, 28] to have 1 channels, but got 3 channels instead Output shape: torch.Size([1, 4, 26, 26])
📊

Quick Reference

ParameterDescriptionDefault
in_channelsNumber of channels in input imageRequired
out_channelsNumber of filters (output channels)Required
kernel_sizeSize of convolution kernel (int or tuple)Required
strideStep size for sliding kernel1
paddingZero-padding added to both sides0
dilationSpacing between kernel elements1
groupsNumber of blocked connections1
biasIf True, adds a learnable biasTrue

Key Takeaways

Use torch.nn.Conv2d with correct in_channels matching your input tensor channels.
Input to Conv2d must be 4D: (batch_size, channels, height, width).
Kernel size, stride, and padding control output spatial dimensions.
Check tensor shapes before and after convolution to avoid errors.
Conv2d layers extract spatial features by sliding filters over images.