0
0
Computer Visionml~20 mins

U-Net architecture in Computer Vision - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
U-Net Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding U-Net's Skip Connections

In the U-Net architecture, what is the main purpose of the skip connections between the encoder and decoder parts?

ATo reduce the number of parameters by sharing weights between encoder and decoder layers.
BTo combine low-level spatial information from the encoder with high-level features in the decoder to improve segmentation accuracy.
CTo increase the depth of the network by adding more convolutional layers in the decoder.
DTo perform max pooling operations to downsample the feature maps.
Attempts:
2 left
💡 Hint

Think about how the network keeps details from the input image while reconstructing the output.

Predict Output
intermediate
2:00remaining
Output Shape After U-Net Encoder Block

Given an input tensor of shape (batch_size=1, channels=3, height=128, width=128), what is the output shape after one encoder block in a U-Net that applies two 3x3 convolutions (padding=1, stride=1) followed by a 2x2 max pooling (stride=2)?

Computer Vision
import torch
import torch.nn as nn

class EncoderBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.ReLU()
        )
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

    def forward(self, x):
        x = self.conv(x)
        p = self.pool(x)
        return x, p

x = torch.randn(1, 3, 128, 128)
block = EncoderBlock(3, 64)
features, pooled = block(x)
output_shape = pooled.shape
A(1, 64, 64, 64)
B(1, 64, 128, 128)
C(1, 3, 64, 64)
D(1, 64, 32, 32)
Attempts:
2 left
💡 Hint

Remember that padding keeps spatial size after convolution, and max pooling halves height and width.

Model Choice
advanced
2:00remaining
Choosing the Right Activation for U-Net Output

For a U-Net model designed to perform binary segmentation (classifying each pixel as foreground or background), which activation function is most appropriate to use in the final layer?

ASoftmax activation
BReLU activation
CTanh activation
DSigmoid activation
Attempts:
2 left
💡 Hint

Consider the output as a probability for each pixel belonging to the foreground class.

Metrics
advanced
2:00remaining
Evaluating U-Net Segmentation with Dice Coefficient

Which of the following formulas correctly computes the Dice coefficient for evaluating the overlap between predicted and ground truth segmentation masks?

ADice = (2 * |Prediction ∩ GroundTruth|) / (|Prediction| + |GroundTruth|)
BDice = |Prediction ∩ GroundTruth| / |Prediction ∪ GroundTruth|
CDice = |Prediction ∪ GroundTruth| / (|Prediction| + |GroundTruth|)
DDice = |Prediction| / |GroundTruth|
Attempts:
2 left
💡 Hint

Dice coefficient measures similarity by doubling the intersection over the sum of sizes.

🔧 Debug
expert
3:00remaining
Identifying the Cause of Dimension Mismatch in U-Net Decoder

In a U-Net decoder block, a concatenation of the upsampled feature map and the corresponding encoder feature map fails with a dimension mismatch error. Given that the encoder feature map has shape (batch_size, 64, 64, 64) and the upsampled decoder feature map has shape (batch_size, 64, 65, 65), what is the most likely cause?

AThe encoder feature map has wrong channel size causing mismatch during concatenation.
BThe batch size is different between encoder and decoder feature maps.
CThe upsampling operation increased spatial dimensions incorrectly, causing a mismatch with encoder features.
DThe concatenation axis is set incorrectly to the batch dimension.
Attempts:
2 left
💡 Hint

Check how upsampling changes height and width compared to encoder features.