When optimizing models by pruning or quantization, the key metrics to watch are accuracy or task-specific performance (like classification accuracy or mean average precision). This is because pruning and quantization reduce model size and speed up inference but can hurt accuracy if done too aggressively. We want to keep accuracy high while making the model smaller and faster.
Additionally, model size (memory footprint) and inference latency (speed) are important metrics to measure the benefits of optimization.