Experiment - Transformer decoder
Problem:You are training a Transformer decoder model for a sequence prediction task. The model currently overfits the training data.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.9
Issue:The model overfits: training accuracy is very high but validation accuracy is low, indicating poor generalization.