Pre-trained models have already learned useful features from large datasets. How does this help reduce the time needed to train a new model?
Think about what it means to start learning from scratch versus starting with some knowledge.
Pre-trained models have weights already tuned on large datasets. This means they have learned general features useful for many tasks. When you train on a new task, you only need to fine-tune these weights, which takes less time than training from zero.
Consider this PyTorch code snippet that loads a pre-trained ResNet18 model and fine-tunes it on a new dataset. What will be the output of the printed statement?
import torch import torchvision.models as models model = models.resnet18(pretrained=True) for param in model.parameters(): param.requires_grad = False model.fc = torch.nn.Linear(model.fc.in_features, 10) # New output layer for 10 classes trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) print(trainable_params)
Only the new final layer's parameters are trainable. Calculate the number of parameters in the new linear layer.
The original ResNet18's final fully connected layer has 512 input features. All original parameters are frozen (requires_grad=False). The new layer is Linear(512, 10), with 512*10=5120 weights + 10 biases = 5130 trainable parameters.
You want to build an image classifier for a small dataset of 500 images. Which pre-trained model choice will likely give the best balance of accuracy and training speed?
Consider model size, dataset size, and training time.
MobileNetV2 is a smaller, efficient model pre-trained on ImageNet. It adapts well to small datasets with less training time. Large models like ResNet152 may overfit or take too long. Training from scratch or random initialization usually needs more data and time.
When fine-tuning a pre-trained model, which learning rate strategy is usually best?
Think about how much you want to change the pre-trained weights versus new layers.
Pre-trained layers already have useful features, so a low learning rate prevents destroying them. New layers need a higher learning rate to learn quickly. Using the same rate or freezing all layers is less effective.
You fine-tune a pre-trained model on a new task. After 10 epochs, training accuracy is 98% but validation accuracy is 70%. What does this indicate?
Think about what it means when training accuracy is high but validation accuracy is low.
High training accuracy with low validation accuracy means the model learned the training data too well, including noise or details that don't apply to new data. This is called overfitting.