For batch inference, throughput and latency matter because you process many inputs at once and want to finish quickly.
For real-time inference, latency is the most important metric because users expect fast responses.
Accuracy, precision, and recall still matter for model quality, but performance metrics like latency and throughput decide if the system meets user needs.