When working with system prompts and role setting in AI models, the key metric to focus on is accuracy of the model's responses matching the intended role or instruction. This is because the system prompt guides the AI's behavior, so measuring how well the output aligns with the prompt ensures the model follows instructions correctly.
Additionally, precision and recall can be important if the task involves classification or identifying specific intents from prompts. For example, precision measures how often the model's responses are relevant to the role, while recall measures how many relevant responses the model captures.