Prompt Engineering / GenAIml~20 mins

Why fine-tuning adapts models to domains in Prompt Engineering / GenAI - Experiment to Prove It

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Why fine-tuning adapts models to domains

Problem:You have a general language model trained on a wide range of texts. When you use it on a specific domain, like medical notes, it doesn't perform well because it lacks domain-specific knowledge.

Current Metrics:Accuracy on domain-specific test data: 65%, Loss: 1.2

Issue:The model performs poorly on domain-specific tasks because it was trained on general data and does not understand domain-specific terms and context.

Your Task

Improve the model's accuracy on the domain-specific data by fine-tuning it with domain-relevant examples, aiming for accuracy above 85% while keeping loss below 0.5.

You can only fine-tune the existing model with domain-specific data.

You cannot retrain the model from scratch.

Keep the number of fine-tuning epochs reasonable (e.g., less than 10).

Hint 1

Hint 2

Hint 3

Solution

Prompt Engineering / GenAI

from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
import torch
import numpy as np
from sklearn.metrics import accuracy_score
from datasets import load_dataset

# Load general pretrained model and tokenizer
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Load domain-specific dataset (example: medical texts)
dataset = load_dataset('csv', data_files={'train': 'domain_train.csv', 'validation': 'domain_val.csv'})

# Tokenize function
def tokenize_function(examples):
    tokenized = tokenizer(examples['text'], padding='max_length', truncation=True)
    tokenized['labels'] = examples['label']
    return tokenized

dataset = dataset.map(tokenize_function, batched=True)

# Compute metrics
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {"accuracy": accuracy_score(labels, predictions)}

# Set training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    evaluation_strategy='epoch',
    save_strategy='epoch',
    learning_rate=2e-5,
    weight_decay=0.01,
    logging_dir='./logs',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy',
    greater_is_better=True
)

# Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['validation'],
    compute_metrics=compute_metrics
)

# Fine-tune the model
trainer.train()

# Evaluate
eval_result = trainer.evaluate()
print(f"Validation Accuracy: {eval_result['eval_accuracy']*100:.2f}%")
print(f"Validation Loss: {eval_result['eval_loss']:.4f}")

Loaded domain-specific dataset for fine-tuning.

Fine-tuned the pretrained general model with domain data for 5 epochs.

Used a small learning rate (2e-5) to preserve general knowledge while adapting.

Monitored validation accuracy and loss to ensure improvement without overfitting.

Results Interpretation

Before fine-tuning: Accuracy 65%, Loss 1.2

After fine-tuning: Accuracy 87.5%, Loss 0.42

Fine-tuning with domain-specific data helps the model learn relevant terms and context, improving accuracy and reducing loss on domain tasks without losing general language understanding.

Bonus Experiment

Try fine-tuning the model with a smaller amount of domain data and compare performance.

💡 Hint

Use techniques like data augmentation or few-shot learning to maximize learning from limited data.

Practice

(1/5)

1. Why do we fine-tune a pre-trained model for a specific domain?

easy

A. To make the model larger and more complex

B. To reduce the model's accuracy on general tasks

C. To erase all previous knowledge from the model

D. To help the model learn details specific to that domain

Why fine-tuning adapts models to domains in Prompt Engineering / GenAI - Experiment to Prove It

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of fine-tuning

Step 2: Identify the effect on the model

Final Answer:

Quick Check:

Solution

Step 1: Recognize common fine-tuning method names

Step 2: Compare options to common usage

Final Answer:

Quick Check:

Solution

Step 1: Calculate loss after each epoch

Step 2: Round the final loss

Final Answer:

Quick Check:

Solution

Step 1: Check the fit() method usage

Step 2: Understand impact on accuracy

Final Answer:

Quick Check:

Solution

Step 1: Compare training from scratch vs fine-tuning

Step 2: Identify best fine-tuning practice

Final Answer:

Quick Check: