Experiment - Transformer architecture
Problem:We want to train a Transformer model to classify short text sentences into categories. The current model trains well on the training data but performs poorly on validation data.
Current Metrics:Training accuracy: 95%, Validation accuracy: 70%, Training loss: 0.15, Validation loss: 0.65
Issue:The model is overfitting: it learns training data too well but does not generalize to new data.
