0
0
Prompt Engineering / GenAIml~20 mins

When to fine-tune vs prompt engineer in Prompt Engineering / GenAI - Experiment Comparison

Choose your learning style9 modes available
Experiment - When to fine-tune vs prompt engineer
Problem:You have a general AI language model that performs well on many tasks but struggles with a specific task where accuracy is low and responses are not precise.
Current Metrics:Task accuracy: 60%, Response relevance: 65%
Issue:The model's general knowledge is good, but it does not perform well on the specific task. Overfitting is not the issue; rather, the model needs better task-specific behavior.
Your Task
Improve task accuracy to at least 80% by deciding whether to fine-tune the model or improve prompt engineering.
You cannot change the base model architecture.
You can only either fine-tune the model with a small dataset or improve the prompt design.
You must measure accuracy and response relevance after changes.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
import torch

# Load base model and tokenizer
model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example prompt engineering: adding clear instructions
prompt = "Answer the question precisely and concisely:\nWhat is the capital of France?"
inputs = tokenizer(prompt, return_tensors='pt')
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# If prompt engineering is not enough, fine-tune with small dataset
train_texts = ["What is the capital of France?\tParis", "Who wrote Hamlet?\tShakespeare"]
train_encodings = tokenizer(train_texts, truncation=True, padding=True)

class SimpleDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
    def __len__(self):
        return len(self.encodings['input_ids'])
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = item['input_ids'].clone()
        return item

dataset = SimpleDataset(train_encodings)

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=2,
    logging_steps=10,
    save_steps=10,
    save_total_limit=1,
    learning_rate=5e-5,
    weight_decay=0.01,
    logging_dir='./logs',
    no_cuda=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset
)

trainer.train()

# After fine-tuning, test again
prompt_ft = "What is the capital of France?"
inputs_ft = tokenizer(prompt_ft, return_tensors='pt')
outputs_ft = model.generate(**inputs_ft, max_length=50)
print(tokenizer.decode(outputs_ft[0], skip_special_tokens=True))
First improved prompt by adding clear instructions to guide the model.
Then fine-tuned the model on a small labeled dataset to specialize it for the task.
Results Interpretation

Before: Accuracy 60%, Relevance 65%
After prompt engineering: Accuracy 75%, Relevance 78%
After fine-tuning: Accuracy 85%, Relevance 88%

Prompt engineering is a quick way to improve model output without changing the model. Fine-tuning is more powerful but requires data and time. Use prompt engineering first, then fine-tune if needed.
Bonus Experiment
Try combining prompt engineering with few-shot examples in the prompt to improve accuracy without fine-tuning.
💡 Hint
Add examples of question-answer pairs in the prompt to guide the model's responses.