LoRA and QLoRA are methods to make large AI models smaller and faster to train. The key metrics to check are model accuracy or task performance after training, and memory usage or speed improvements. We want to keep accuracy high while reducing memory and training time. So, accuracy and efficiency metrics matter most.
LoRA and QLoRA concepts in Prompt Engineering / GenAI - Model Metrics & Evaluation
For classification tasks, a confusion matrix shows how well the model predicts each class. For LoRA and QLoRA, the confusion matrix before and after applying these methods helps us see if accuracy dropped.
Confusion Matrix Example:
Predicted
P N
Actual P 90 10
N 15 85
Total samples = 200
If LoRA or QLoRA keeps similar numbers here, it means they preserved accuracy well.
LoRA and QLoRA reduce model size and speed up training but might slightly reduce accuracy. This is a tradeoff:
- Precision: How many predicted positives are correct?
- Recall: How many actual positives did the model find?
Example: In spam detection, if LoRA reduces recall, some spam emails might be missed. But if precision stays high, fewer good emails are wrongly marked spam. Depending on the task, you decide which metric to prioritize.
Good: Accuracy or F1 score close to the original full model (e.g., within 1-2%), with much lower memory use and faster training.
Bad: Large drops in accuracy or recall (e.g., more than 5%), meaning the model misses many correct answers, even if it is smaller or faster.
- Ignoring accuracy drop: Focusing only on speed or size but losing too much accuracy.
- Data leakage: Testing on data the model saw during training, making metrics look better than real.
- Overfitting: Model performs well on training data but poorly on new data, hiding true performance.
- Not comparing to baseline: Without the original model's metrics, it's hard to judge if LoRA or QLoRA helped or hurt.
Your model uses QLoRA and has 98% accuracy but only 12% recall on fraud cases. Is it good for production? Why or why not?
Answer: No, it is not good. Even though accuracy is high, the very low recall means the model misses most fraud cases. For fraud detection, recall is critical because missing fraud is costly. So this model would not be reliable in real use.