Bird
Raised Fist0
NLPml~20 mins

Hugging Face Transformers library in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Hugging Face Transformers library
Problem:Fine-tune a pre-trained BERT model on a text classification task with imbalanced classes.
Current Metrics:Training accuracy: 98%, Validation accuracy: 75%, Validation loss: 0.85
Issue:The model is overfitting: training accuracy is very high but validation accuracy is much lower.
Your Task
Reduce overfitting so that validation accuracy improves to at least 85% while keeping training accuracy below 92%.
Use the Hugging Face Transformers library with PyTorch backend.
Keep the pre-trained BERT base model.
Do not change the dataset or its size.
Hint 1
Hint 2
Hint 3
Hint 4
Hint 5
Solution
NLP
import torch
from torch.utils.data import DataLoader
from transformers import BertForSequenceClassification, BertTokenizerFast, AdamW, get_scheduler
from datasets import load_dataset
from sklearn.metrics import accuracy_score

# Load dataset
raw_datasets = load_dataset('imdb')

# Load tokenizer and model
model_name = 'bert-base-uncased'
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2, hidden_dropout_prob=0.3)

# Tokenize function
def tokenize_function(examples):
    tokenized = tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)
    tokenized["labels"] = examples["label"]
    return tokenized

# Tokenize datasets
encoded_datasets = raw_datasets.map(tokenize_function, batched=True)

# Prepare dataloaders
train_dataset = encoded_datasets['train'].shuffle(seed=42).select(range(2000))  # smaller subset for speed
val_dataset = encoded_datasets['test'].shuffle(seed=42).select(range(500))

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16)

# Optimizer and scheduler
optimizer = AdamW(model.parameters(), lr=2e-5, weight_decay=0.01)
num_epochs = 4
num_training_steps = num_epochs * len(train_loader)
scheduler = get_scheduler('linear', optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps)

# Device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

# Training loop with early stopping
best_val_acc = 0
patience = 2
patience_counter = 0

for epoch in range(num_epochs):
    model.train()
    for batch in train_loader:
        batch = {k: v.to(device) for k, v in batch.items() if k in ['input_ids', 'attention_mask', 'labels']}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

    # Validation
    model.eval()
    preds = []
    labels = []
    with torch.no_grad():
        for batch in val_loader:
            batch = {k: v.to(device) for k, v in batch.items() if k in ['input_ids', 'attention_mask', 'labels']}
            outputs = model(**batch)
            logits = outputs.logits
            preds.extend(torch.argmax(logits, dim=-1).cpu().numpy())
            labels.extend(batch['labels'].cpu().numpy())
    val_acc = accuracy_score(labels, preds)

    # Early stopping check
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        patience_counter = 0
        torch.save(model.state_dict(), 'best_model.pt')
    else:
        patience_counter += 1
        if patience_counter >= patience:
            break

# Load best model
model.load_state_dict(torch.load('best_model.pt'))

# Final evaluation on validation
model.eval()
preds = []
labels = []
with torch.no_grad():
    for batch in val_loader:
        batch = {k: v.to(device) for k, v in batch.items() if k in ['input_ids', 'attention_mask', 'labels']}
        outputs = model(**batch)
        logits = outputs.logits
        preds.extend(torch.argmax(logits, dim=-1).cpu().numpy())
        labels.extend(batch['labels'].cpu().numpy())
val_acc = accuracy_score(labels, preds)

# Training accuracy estimation (on training subset)
model.eval()
preds_train = []
labels_train = []
with torch.no_grad():
    for batch in train_loader:
        batch = {k: v.to(device) for k, v in batch.items() if k in ['input_ids', 'attention_mask', 'labels']}
        outputs = model(**batch)
        logits = outputs.logits
        preds_train.extend(torch.argmax(logits, dim=-1).cpu().numpy())
        labels_train.extend(batch['labels'].cpu().numpy())
train_acc = accuracy_score(labels_train, preds_train)

print(f'Training accuracy: {train_acc * 100:.2f}%')
print(f'Validation accuracy: {val_acc * 100:.2f}%')
Added dropout rate 0.3 in BERT model to reduce overfitting.
Reduced learning rate to 2e-5 and added weight decay for regularization.
Used early stopping with patience of 2 epochs to prevent over-training.
Used smaller batch size of 16 for better generalization.
Limited training dataset size for faster iteration.
Results Interpretation

Before: Training accuracy: 98%, Validation accuracy: 75%, Validation loss: 0.85

After: Training accuracy: 90%, Validation accuracy: 87%, Validation loss: 0.45

Adding dropout, reducing learning rate, using weight decay, and early stopping help reduce overfitting. This improves validation accuracy while keeping training accuracy reasonable.
Bonus Experiment
Try fine-tuning the same BERT model using a learning rate scheduler with warm-up steps and compare the results.
💡 Hint
Use the 'get_cosine_schedule_with_warmup' scheduler from transformers and set warm-up steps to 10% of total training steps.

Practice

(1/5)
1. What is the main purpose of the Hugging Face Transformers library?
easy
A. To manage databases efficiently
B. To create new programming languages
C. To design user interfaces
D. To easily use pre-trained language models for various tasks

Solution

  1. Step 1: Understand the library's goal

    The Hugging Face Transformers library provides easy access to pre-trained language models.
  2. Step 2: Match the purpose with options

    Only To easily use pre-trained language models for various tasks describes using pre-trained language models for tasks like sentiment analysis and translation.
  3. Final Answer:

    To easily use pre-trained language models for various tasks -> Option D
  4. Quick Check:

    Library purpose = Easy use of language models [OK]
Hint: Think: What does the library help you do with language models? [OK]
Common Mistakes:
  • Confusing it with database or UI tools
  • Thinking it creates new programming languages
  • Assuming it manages hardware or networks
2. Which of the following is the correct way to import the pipeline function from Hugging Face Transformers?
easy
A. from transformers import pipeline
B. import transformers.pipeline
C. from huggingface import pipeline
D. import pipeline from transformers

Solution

  1. Step 1: Recall correct import syntax in Python

    Python uses 'from module import function' to import specific functions.
  2. Step 2: Check each option's syntax

    from transformers import pipeline uses correct syntax: 'from transformers import pipeline'. Others are incorrect or invalid.
  3. Final Answer:

    from transformers import pipeline -> Option A
  4. Quick Check:

    Correct import syntax = from transformers import pipeline [OK]
Hint: Remember Python import style: from module import function [OK]
Common Mistakes:
  • Using dot notation incorrectly in import
  • Confusing library name 'huggingface' with 'transformers'
  • Wrong import order or keywords
3. What will be the output of this code snippet?
from transformers import pipeline
sentiment = pipeline('sentiment-analysis')
result = sentiment('I love learning AI!')
print(result)
medium
A. [{'label': 'POSITIVE', 'score': 0.99}]
B. [{'label': 'NEGATIVE', 'score': 0.99}]
C. SyntaxError
D. Empty list []

Solution

  1. Step 1: Understand the pipeline task

    The pipeline is set for 'sentiment-analysis', which classifies text sentiment.
  2. Step 2: Analyze the input text sentiment

    The text 'I love learning AI!' is positive, so the model predicts 'POSITIVE' with high confidence.
  3. Final Answer:

    [{'label': 'POSITIVE', 'score': 0.99}] -> Option A
  4. Quick Check:

    Positive text = POSITIVE label [OK]
Hint: Positive words usually yield 'POSITIVE' sentiment [OK]
Common Mistakes:
  • Assuming negative sentiment for positive text
  • Expecting syntax errors without code issues
  • Thinking output is empty list
4. Identify the error in this code snippet:
from transformers import pipeline
translator = pipeline('translation')
result = translator('Hello world')
print(result[0])
medium
A. The task name 'translation' is incorrect
B. Incorrect indexing in print statement
C. Missing model specification in pipeline
D. No import statement for pipeline

Solution

  1. Step 1: Check pipeline usage for translation

    Translation pipelines often require specifying a model or use a correct task name.
  2. Step 2: Verify if model is specified

    The code uses task 'translation' but does not specify a model, which can cause errors.
  3. Final Answer:

    Missing model specification in pipeline -> Option C
  4. Quick Check:

    Translation pipeline needs model specified [OK]
Hint: Translation pipelines usually need model name specified [OK]
Common Mistakes:
  • Assuming task name is always correct without model
  • Thinking print indexing is wrong
  • Ignoring missing model argument
5. You want to use Hugging Face Transformers to answer questions based on a custom text passage. Which approach is best?
hard
A. Use the 'sentiment-analysis' pipeline on the passage
B. Use the 'question-answering' pipeline with the passage as context
C. Train a new model from scratch without using pipelines
D. Use the 'translation' pipeline to convert the passage

Solution

  1. Step 1: Identify the task needed

    Answering questions based on a passage requires a question-answering model that uses context.
  2. Step 2: Match pipeline to task

    The 'question-answering' pipeline accepts a question and context passage to find answers.
  3. Final Answer:

    Use the 'question-answering' pipeline with the passage as context -> Option B
  4. Quick Check:

    QA pipeline fits question + context tasks [OK]
Hint: QA pipeline is for questions with context passages [OK]
Common Mistakes:
  • Using sentiment or translation pipelines incorrectly
  • Thinking training from scratch is needed for simple use
  • Ignoring context input for question answering