ResNet introduced skip connections to help deep neural networks. What is the main reason for using these skip connections?
Think about what happens to gradients when networks get very deep.
Skip connections let gradients flow directly through the network, which helps avoid the vanishing gradient problem and allows training of very deep networks.
Consider a ResNet block where the input tensor has shape (batch_size=32, height=64, width=64, channels=64). The block applies two convolution layers with padding='same' and keeps the number of channels the same. What will be the output shape after adding the skip connection?
import tensorflow as tf input_tensor = tf.random.normal([32, 64, 64, 64]) conv1 = tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu')(input_tensor) conv2 = tf.keras.layers.Conv2D(64, 3, padding='same')(conv1) output = tf.keras.layers.Add()([input_tensor, conv2]) print(output.shape)
Padding='same' keeps height and width unchanged. The skip connection adds tensors of the same shape.
Since padding='same' keeps spatial dimensions and channels unchanged, adding the input tensor and conv2 output results in the same shape as input.
In ResNet, when the input and output channels differ, a skip connection cannot be a simple addition. Which option correctly handles this channel mismatch?
Think about how to change the input tensor shape to match the output tensor shape for addition.
A 1x1 convolution adjusts the input channels to match the output channels, enabling element-wise addition in the skip connection.
When training a very deep ResNet with skip connections, how does the training loss curve typically compare to a similar deep network without skip connections?
Consider how skip connections help gradients during backpropagation.
Skip connections help gradients flow better, allowing faster convergence and lower training loss in deep networks.
Examine the following PyTorch code snippet for a ResNet block with a skip connection. What error will occur when running this code?
import torch import torch.nn as nn class ResNetBlock(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1) self.relu = nn.ReLU() def forward(self, x): out = self.relu(self.conv1(x)) out = self.conv2(out) out += x # skip connection out = self.relu(out) return out block = ResNetBlock(64, 128) input_tensor = torch.randn(1, 64, 32, 32) output = block(input_tensor)
Check the shapes of tensors before the addition in the forward method.
The input tensor has 64 channels but the output of conv2 has 128 channels. Adding them directly causes a shape mismatch error.