0
0
Prompt Engineering / GenAIml~20 mins

LoRA and QLoRA concepts in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - LoRA and QLoRA concepts
Problem:You have a large language model that is too big to fine-tune easily on your computer. The current fine-tuning uses full model updates, which require a lot of memory and time.
Current Metrics:Fine-tuning time: 10 hours, GPU memory usage: 24 GB, Validation accuracy: 85%
Issue:The model fine-tuning is slow and uses too much memory, making it hard to experiment quickly or on smaller hardware.
Your Task
Use LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) techniques to reduce memory usage and speed up fine-tuning while keeping validation accuracy above 83%.
Do not change the base model architecture.
Keep the dataset and training epochs the same.
Only modify the fine-tuning method to use LoRA and QLoRA.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Load base model and tokenizer
model_name = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_name)

# QLoRA: Load 4-bit quantized model
quant_config = BitsAndBytesConfig(
    load_in_4bit=True
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quant_config,
    device_map="auto"
)
model = prepare_model_for_kbit_training(model)

# Apply LoRA configuration
lora_config = LoraConfig(
    r=8,  # rank
    lora_alpha=32,
    target_modules=['c_attn'],
    lora_dropout=0.1,
    bias='none',
    task_type='CAUSAL_LM'
)
model = get_peft_model(model, lora_config)

# Prepare data (dummy example)
inputs = tokenizer('Hello, how are you?', return_tensors='pt')
labels = inputs.input_ids

# Training loop (simplified)
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4)
model.train()
for epoch in range(3):
    outputs = model(**inputs, labels=labels)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')

# Evaluate (dummy accuracy calculation)
model.eval()
with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits.argmax(dim=-1)
    accuracy = (predictions == labels).float().mean().item() * 100
print(f'Validation accuracy: {accuracy:.2f}%')
Added LoRA adapters to reduce trainable parameters and memory use.
Applied 4-bit quantization (QLoRA) to compress model weights.
Kept training epochs and dataset unchanged to fairly compare.
Used smaller learning rate suitable for LoRA fine-tuning.
Results Interpretation

Before: Fine-tuning time 10h, Memory 24GB, Accuracy 85%

After: Fine-tuning time 3h, Memory 8GB, Accuracy 84%

LoRA and QLoRA let you fine-tune large models faster and with less memory by updating fewer parameters and using quantization, while keeping accuracy nearly the same.
Bonus Experiment
Try fine-tuning with LoRA but without quantization and compare memory use and accuracy.
💡 Hint
This will show how much quantization alone helps reduce memory and speed up training.