Experiment - Gradient accumulation and zeroing
Problem:You are training a neural network on a small GPU that cannot fit a large batch size. Currently, you use a batch size of 16, but the model trains slowly and the gradients are reset after every batch. This limits the effective batch size and slows learning.
Current Metrics:Training loss decreases slowly, validation accuracy reaches about 70% after 10 epochs.
Issue:The model trains slowly because the batch size is small. Gradients are zeroed after every batch, so the model cannot accumulate gradients over multiple batches to simulate a larger batch size.