Model Pipeline - Gradient accumulation and zeroing
This pipeline shows how gradient accumulation and zeroing help train a model efficiently when batch size is limited by memory. Instead of updating weights every batch, gradients are added up over several batches before updating.