When searching for the best model design, we want metrics that show how well the model learns and generalizes.
Common metrics include validation accuracy or validation loss. These tell us if the model works well on new data, not just training data.
For classification tasks, accuracy is simple and clear. For imbalanced data, metrics like F1 score or AUC help us understand performance better.
We focus on validation metrics because architecture search tries many designs. We want to pick the one that performs best on data it hasn't seen before.