Top-K accuracy measures if the correct answer is within the model's top K guesses. This is useful when many classes exist, like recognizing objects in pictures. It tells us if the model is close, even if not perfect. For example, if K=5, the model is right if the true label is in its 5 best guesses. This helps when exact match is hard but near misses still count.
Top-K accuracy in Computer Vision - Model Metrics & Evaluation
True label: Cat
Model top-1 guess: Dog (wrong)
Model top-5 guesses: [Dog, Cat, Rabbit, Fox, Horse]
Top-1 accuracy: 0 (missed)
Top-5 accuracy: 1 (hit, because Cat is in top 5)
Confusion matrix is less useful here because Top-K checks multiple guesses.
Instead, we count how often true label is in top K predictions over all samples.
Top-K accuracy is about recall: how often the true label is found in the top K guesses. Increasing K raises recall but may lower precision because more guesses include wrong labels.
Example: In a photo app, showing top 3 guesses helps users find the right label even if top guess is wrong. But showing too many guesses (large K) can confuse users.
So, choose K to balance user ease (higher recall) and clarity (higher precision).
Good Top-1 accuracy means the model often guesses exactly right. For example, 80% Top-1 accuracy means 8 out of 10 times the first guess is correct.
Good Top-5 accuracy might be 95%, meaning the true label is almost always in the top 5 guesses.
Bad values are low numbers, like 30% Top-1 and 50% Top-5, showing the model struggles to find the right label even among many guesses.
- Relying only on Top-1 accuracy can hide near misses that are useful in practice.
- Choosing too large K inflates accuracy but may not help real users.
- Data leakage can make Top-K accuracy look better if test data is too similar to training.
- Overfitting can cause high Top-K accuracy on training but poor real-world results.
Your image classifier has 60% Top-1 accuracy but 90% Top-5 accuracy. Is this good?
Answer: It depends on your use case. If users can pick from top 5 guesses, 90% means the model is helpful. But 60% Top-1 means it often misses the first guess, so if exact match is needed, it may not be good enough.