PytorchHow-ToBeginner · 4 min read

How to Use Conv2d in PyTorch: Syntax and Example

In PyTorch, use torch.nn.Conv2d to create a 2D convolutional layer by specifying input channels, output channels, and kernel size. Apply it to input tensors with shape (batch_size, channels, height, width) to extract spatial features.

📐

Syntax

The torch.nn.Conv2d constructor requires these main arguments:

in_channels: Number of input channels (e.g., 3 for RGB images).
out_channels: Number of filters (output channels) the layer will produce.
kernel_size: Size of the convolution filter (e.g., 3 for 3x3).
stride (optional): Step size for sliding the filter (default is 1).
padding (optional): Number of pixels added around input (default is 0).

After creating the layer, call it like a function on a 4D input tensor with shape (batch_size, in_channels, height, width).

python

conv = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=0)
output = conv(input_tensor)

💻

Example

This example creates a Conv2d layer with 3 input channels and 6 output channels using a 3x3 kernel. It applies the layer to a random input tensor shaped like a batch of 1 RGB image of size 32x32. The output shape shows how the convolution changes the spatial dimensions.

python

import torch
import torch.nn as nn

# Create Conv2d layer
conv_layer = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

# Create a random input tensor: batch size 1, 3 channels, 32x32 image
input_tensor = torch.randn(1, 3, 32, 32)

# Apply convolution
output_tensor = conv_layer(input_tensor)

# Print output shape
print('Output shape:', output_tensor.shape)

Output

Output shape: torch.Size([1, 6, 30, 30])

⚠️

Common Pitfalls

Common mistakes when using Conv2d include:

Not matching in_channels to the input tensor's channel size.
Forgetting that input tensors must be 4D: (batch_size, channels, height, width).
Ignoring how kernel_size, stride, and padding affect output size.
Using incorrect padding causing output size to shrink unexpectedly.

Always check tensor shapes before and after convolution to avoid shape mismatch errors.

python

import torch
import torch.nn as nn

# Wrong: input channels mismatch
conv_wrong = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3)
input_wrong = torch.randn(1, 3, 28, 28)  # 3 channels but conv expects 1

try:
    output_wrong = conv_wrong(input_wrong)
except Exception as e:
    print('Error:', e)

# Right: matching input channels
conv_right = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=3)
output_right = conv_right(input_wrong)
print('Output shape:', output_right.shape)

Output

Error: Given groups=1, weight of size [4, 1, 3, 3], expected input[1, 3, 28, 28] to have 1 channels, but got 3 channels instead Output shape: torch.Size([1, 4, 26, 26])

📊

Quick Reference

Parameter	Description	Default
in_channels	Number of channels in input image	Required
out_channels	Number of filters (output channels)	Required
kernel_size	Size of convolution kernel (int or tuple)	Required
stride	Step size for sliding kernel	1
padding	Zero-padding added to both sides	0
dilation	Spacing between kernel elements	1
groups	Number of blocked connections	1
bias	If True, adds a learnable bias	True

✅

Key Takeaways

Use torch.nn.Conv2d with correct in_channels matching your input tensor channels.

Input to Conv2d must be 4D: (batch_size, channels, height, width).

Kernel size, stride, and padding control output spatial dimensions.

Check tensor shapes before and after convolution to avoid errors.

Conv2d layers extract spatial features by sliding filters over images.