Model Pipeline - Model optimization (distillation, quantization)
This pipeline shows how a large language model is made smaller and faster using two techniques: distillation and quantization. Distillation teaches a small model to copy a big model's knowledge. Quantization makes the model use fewer bits to store numbers, saving space and speeding up predictions.