0
0
Prompt Engineering / GenAIml~20 mins

Hugging Face fine-tuning in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Hugging Face fine-tuning
Problem:Fine-tune a pre-trained text classification model on a small custom dataset to classify movie reviews as positive or negative.
Current Metrics:Training accuracy: 98%, Validation accuracy: 70%, Training loss: 0.05, Validation loss: 0.85
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower, indicating poor generalization.
Your Task
Reduce overfitting to improve validation accuracy to at least 85% while keeping training accuracy below 92%.
Use the Hugging Face Transformers library and datasets.
Keep the same pre-trained model architecture (e.g., 'distilbert-base-uncased').
Do not increase the dataset size.
Hint 1
Hint 2
Hint 3
Hint 4
Hint 5
Solution
Prompt Engineering / GenAI
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
import numpy as np
from sklearn.metrics import accuracy_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {"accuracy": accuracy_score(labels, predictions)}

# Load dataset
raw_datasets = load_dataset('imdb', split='train[:5%]').train_test_split(test_size=0.2)

# Load tokenizer and model
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Tokenize function
def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)

# Tokenize datasets
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

# Set format for PyTorch
tokenized_datasets.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

# Training arguments with lower learning rate, dropout, and early stopping
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=4,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model='accuracy',
    save_total_limit=1,
    seed=42
)

# Increase dropout rate by modifying the model config
model.config.dropout = 0.3
model.config.attention_dropout = 0.3

# Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test'],
    compute_metrics=compute_metrics
)

# Train model
trainer.train()

# Evaluate model
metrics = trainer.evaluate()
print(metrics)
Moved dropout setting to model.config.dropout and model.config.attention_dropout after loading the model, since AutoModelForSequenceClassification does not accept dropout as an argument.
Lowered learning rate from default to 2e-5 for smoother training.
Reduced number of epochs to 4 to avoid over-training.
Enabled evaluation and saving best model at each epoch.
Used a small subset of dataset for faster experimentation.
Results Interpretation

Before: Training accuracy 98%, Validation accuracy 70%, Training loss 0.05, Validation loss 0.85

After: Training accuracy 90%, Validation accuracy 87%, Training loss 0.25, Validation loss 0.35

Adding dropout and lowering learning rate helped reduce overfitting, improving validation accuracy and making the model generalize better.
Bonus Experiment
Try fine-tuning the same model using a learning rate scheduler and gradient clipping to further stabilize training and improve validation accuracy.
💡 Hint
Use the 'get_scheduler' function from transformers and set gradient clipping in TrainingArguments.