Imagine you have two models: one with many layers and one with fewer layers. Which statement best explains how the number of layers affects training speed?
Think about how adding more steps in a recipe takes more time.
More layers add more calculations, so training takes longer. This affects how scalable the model is when using bigger data or limited hardware.
You have a very large dataset and limited computing power. Which model architecture choice helps scalability best?
Think about what works well when your computer has limited memory.
Simpler models with fewer parameters use less memory and compute, making them more scalable on limited hardware.
Consider training a model with different batch sizes. Which batch size choice best supports scalability on limited GPU memory?
Think about how much data you can hold in your hands at once.
Smaller batch sizes use less memory, allowing training on limited GPUs and improving scalability.
You train two models on the same dataset. Model A takes 2 hours, Model B takes 6 hours. Both have similar accuracy. What does this say about their scalability?
Think about which model can handle bigger data faster.
Faster training with similar accuracy means better scalability, as the model can handle larger data or more experiments efficiently.
Given a model that slows down drastically when data size doubles, which architectural choice is most likely causing the bottleneck?
Think about which layer type grows computation most with input size.
Fully connected layers scale poorly with input size because they connect every input to every neuron, causing a big increase in computations as data grows.