Experiment - Hugging Face integration basics

Problem:You want to fine-tune a Hugging Face transformer model for text classification but your model overfits. Training accuracy is 98% but validation accuracy is only 70%.

Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.8

Issue:The model is overfitting the training data and does not generalize well to validation data.

Your Task

Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.

You can only modify the model training code (no changes to dataset).

Use PyTorch and Hugging Face transformers.

Keep the same transformer architecture.

Hint 1

Hint 2

Hint 3

Hint 4

Solution

PyTorch

import torch
from torch.utils.data import DataLoader
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig, AdamW, get_scheduler
from datasets import load_dataset

# Load dataset
raw_datasets = load_dataset('imdb')

# Load tokenizer and model
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
config.hidden_dropout_prob = 0.3
config.attention_probs_dropout_prob = 0.3
model = AutoModelForSequenceClassification.from_pretrained(model_name, config=config, num_labels=2)

# Tokenize function
def tokenize_function(examples):
    tokenized = tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)
    tokenized['labels'] = examples['label']
    return tokenized

# Tokenize datasets
encoded_datasets = raw_datasets.map(tokenize_function, batched=True)

# Prepare dataloaders
train_dataset = encoded_datasets['train'].shuffle(seed=42).select(range(2000))
val_dataset = encoded_datasets['test'].shuffle(seed=42).select(range(500))

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

# Optimizer and scheduler
optimizer = AdamW(model.parameters(), lr=5e-5, weight_decay=0.01)
num_epochs = 3
num_training_steps = num_epochs * len(train_loader)
scheduler = get_scheduler('linear', optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)

# Training loop with early stopping
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

best_val_acc = 0
patience = 2
patience_counter = 0

for epoch in range(num_epochs):
    model.train()
    for batch in train_loader:
        batch = {k: v.to(device) for k, v in batch.items() if k in ['input_ids', 'attention_mask', 'labels']}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

    # Validation
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch in val_loader:
            batch = {k: v.to(device) for k, v in batch.items() if k in ['input_ids', 'attention_mask', 'labels']}
            outputs = model(**batch)
            predictions = outputs.logits.argmax(dim=-1)
            correct += (predictions == batch['labels']).sum().item()
            total += batch['labels'].size(0)
    val_acc = correct / total * 100
    print(f'Epoch {epoch+1}: Validation Accuracy: {val_acc:.2f}%')

    # Early stopping
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print('Early stopping triggered')
            break

Added dropout rate of 0.3 in the transformer model to reduce overfitting.

Reduced batch size to 16 for better generalization.

Added weight decay (0.01) to optimizer to regularize weights.

Implemented learning rate scheduler with linear decay.

Added early stopping with patience of 2 epochs to stop training when validation accuracy stops improving.

Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.8

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.35

Adding dropout, weight decay, and early stopping helps reduce overfitting. This improves validation accuracy by making the model generalize better, even if training accuracy decreases slightly.

Bonus Experiment

Try using data augmentation techniques like back translation or synonym replacement to further improve validation accuracy.

💡 Hint

Augmenting text data can increase diversity and help the model learn more robust features.