Latency measures how fast a model or system responds. Lower latency means quicker answers, which is important for real-time tasks like chat or driving cars.
Cost measures how much money or resources it takes to run the model. Lower cost means saving money and energy.
We focus on both because a fast model that costs too much is not practical, and a cheap model that is too slow can frustrate users.