EfficientNet uses a compound scaling method to scale up the model. What does this method do?
Think about how EfficientNet balances different model dimensions together.
Compound scaling uniformly scales depth, width, and input resolution with fixed coefficients to balance model size and accuracy.
You want to train an EfficientNet model on a GPU with limited memory. Which variant should you choose to balance accuracy and memory use?
Smaller variants use less memory but have lower accuracy.
EfficientNet-B0 is the smallest variant, requiring less memory and suitable for limited GPU resources.
Given the following code snippet using PyTorch, what is the output shape of the tensor after scaling?
import torch from torchvision.models import efficientnet_b0 model = efficientnet_b0() input_tensor = torch.randn(1, 3, 224, 224) output = model.features(input_tensor) print(output.shape)
Look at the output channels (1280) and spatial size after the feature extractor in EfficientNet-B0.
The features part of EfficientNet-B0 outputs a tensor with shape [batch, 1280, 7, 7] before the avgpool and classifier.
What is the main effect of increasing the input image resolution in EfficientNet's compound scaling?
Think about what happens when you feed larger images into a convolutional network.
Increasing input resolution increases feature map sizes, allowing the model to capture more detail but requiring more computation.
Which EfficientNet variant has approximately 19 billion FLOPS and achieves around 84.0% top-1 accuracy on ImageNet?
Recall the FLOPS and accuracy increase with variant number.
EfficientNet-B6 has about 19B FLOPS and achieves ~84.0% top-1 accuracy on ImageNet.