0
0
Computer Visionml~10 mins

Vision Transformer (ViT) in Computer Vision - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to import the Vision Transformer model from the torchvision library.

Computer Vision
from torchvision.models import [1]
Drag options to blanks, or click blank then click option'
Aalexnet
Bresnet50
Cvgg16
Dvit_b_16
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing a CNN model like resnet50 instead of the Vision Transformer.
Confusing ViT with older architectures like alexnet or vgg16.
2fill in blank
medium

Complete the code to create a Vision Transformer model pretrained on ImageNet.

Computer Vision
model = [1](pretrained=True)
Drag options to blanks, or click blank then click option'
Adensenet121
Bresnet50
Cvit_b_16
Dmobilenet_v2
Attempts:
3 left
💡 Hint
Common Mistakes
Using a CNN model instead of ViT.
Forgetting to set pretrained=True to get pretrained weights.
3fill in blank
hard

Fix the error in the code to correctly reshape the input image tensor for ViT patch embedding.

Computer Vision
patches = x.unfold(2, [1], [1]).unfold(3, [1], [1])
Drag options to blanks, or click blank then click option'
A8
B16
C32
D64
Attempts:
3 left
💡 Hint
Common Mistakes
Using patch sizes other than 16 causes shape mismatch errors.
Confusing patch size with image size.
4fill in blank
hard

Fill both blanks to complete the code that applies the multi-head self-attention mechanism in ViT.

Computer Vision
attention_output = self.attn(query, key, value, [1]=mask, [2]=True)
Drag options to blanks, or click blank then click option'
Aattn_mask
Bkey_padding_mask
Cbatch_first
Ddropout
Attempts:
3 left
💡 Hint
Common Mistakes
Using key_padding_mask instead of attn_mask for the mask parameter.
Omitting batch_first=True causing shape errors.
5fill in blank
hard

Fill all three blanks to complete the code that computes the classification output from the ViT model.

Computer Vision
cls_token = x[:, [1]].unsqueeze(1)
output = self.mlp_head(cls_token).squeeze([2])
loss = criterion(output, [3])
Drag options to blanks, or click blank then click option'
A0
B1
Clabels
D2
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong index for CLS token.
Squeezing the wrong dimension causing shape errors.
Passing predictions instead of labels to the loss function.