LoRA and QLoRA are methods to make large AI models smaller and faster to train. The key metrics to check are model accuracy or task performance after training, and memory usage or speed improvements. We want to keep accuracy high while reducing memory and training time. So, accuracy and efficiency metrics matter most.
LoRA and QLoRA concepts in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
For classification tasks, a confusion matrix shows how well the model predicts each class. For LoRA and QLoRA, the confusion matrix before and after applying these methods helps us see if accuracy dropped.
Confusion Matrix Example:
Predicted
P N
Actual P 90 10
N 15 85
Total samples = 200
If LoRA or QLoRA keeps similar numbers here, it means they preserved accuracy well.
LoRA and QLoRA reduce model size and speed up training but might slightly reduce accuracy. This is a tradeoff:
- Precision: How many predicted positives are correct?
- Recall: How many actual positives did the model find?
Example: In spam detection, if LoRA reduces recall, some spam emails might be missed. But if precision stays high, fewer good emails are wrongly marked spam. Depending on the task, you decide which metric to prioritize.
Good: Accuracy or F1 score close to the original full model (e.g., within 1-2%), with much lower memory use and faster training.
Bad: Large drops in accuracy or recall (e.g., more than 5%), meaning the model misses many correct answers, even if it is smaller or faster.
- Ignoring accuracy drop: Focusing only on speed or size but losing too much accuracy.
- Data leakage: Testing on data the model saw during training, making metrics look better than real.
- Overfitting: Model performs well on training data but poorly on new data, hiding true performance.
- Not comparing to baseline: Without the original model's metrics, it's hard to judge if LoRA or QLoRA helped or hurt.
Your model uses QLoRA and has 98% accuracy but only 12% recall on fraud cases. Is it good for production? Why or why not?
Answer: No, it is not good. Even though accuracy is high, the very low recall means the model misses most fraud cases. For fraud detection, recall is critical because missing fraud is costly. So this model would not be reliable in real use.
Practice
Solution
Step 1: Understand LoRA's role in model training
LoRA adds small trainable parts to a big model instead of retraining the whole model, making training easier and cheaper.Step 2: Compare options with LoRA's purpose
Options B, C, and D describe changing model size or structure, which is not what LoRA does.Final Answer:
To add small trainable parts that make training easier and cheaper -> Option BQuick Check:
LoRA = small trainable parts for easier training [OK]
- Thinking LoRA replaces the whole model
- Confusing LoRA with model size increase
- Assuming LoRA removes layers
Solution
Step 1: Recall QLoRA's definition
QLoRA combines LoRA with quantization (number compression) to reduce memory use and speed up training.Step 2: Eliminate incorrect options
Options B, C, and D contradict QLoRA's purpose by ignoring compression or removing LoRA parts.Final Answer:
A method that combines LoRA with quantization to save memory -> Option AQuick Check:
QLoRA = LoRA + quantization for memory saving [OK]
- Ignoring quantization in QLoRA
- Thinking QLoRA removes LoRA parts
- Believing QLoRA increases model size
model_size = 1000 # in MB lora_size = 10 # LoRA adds 10 MB quantization_factor = 0.25 # QLoRA compresses to 25% lora_model_size = model_size + lora_size qlora_model_size = int(lora_model_size * quantization_factor) print(qlora_model_size)
What is the printed output?
Solution
Step 1: Calculate LoRA model size
LoRA adds 10 MB to 1000 MB, so lora_model_size = 1000 + 10 = 1010 MB.Step 2: Apply QLoRA compression
QLoRA compresses to 25%, so qlora_model_size = int(1010 * 0.25) = int(252.5) = 252 MB.Final Answer:
252 -> Option AQuick Check:
1010 * 0.25 = 252.5 -> 252 [OK]
- Multiplying before adding LoRA size
- Rounding incorrectly
- Using 0.2 instead of 0.25 for compression
model_size = 800 lora_size = 20 quantization_factor = 0.3 qlora_model_size = model_size + lora_size * quantization_factor print(qlora_model_size)
What is the error and how to fix it?
Solution
Step 1: Identify operator precedence issue
Multiplication (*) happens before addition (+), so only lora_size is multiplied by quantization_factor, not the sum.Step 2: Fix with parentheses
Use (model_size + lora_size) * quantization_factor to multiply the total size by compression factor.Final Answer:
Missing parentheses; fix with (model_size + lora_size) * quantization_factor -> Option DQuick Check:
Parentheses fix operator order [OK]
- Ignoring operator precedence
- Changing variable names incorrectly
- Using wrong operators like //
Solution
Step 1: Understand resource limits
Small laptops have limited memory, so full model training or full precision is too heavy.Step 2: Choose best method
QLoRA combines LoRA's small trainable parts with quantization compression, saving memory and speeding training.Step 3: Compare options
Options B and D ignore memory limits; A lacks compression benefits.Final Answer:
Use QLoRA to compress the model and add LoRA layers for efficient training -> Option CQuick Check:
QLoRA = LoRA + compression for small devices [OK]
- Ignoring compression benefits
- Trying full model training on small memory
- Using only LoRA without compression
