Open
Description
The idea would be to allow evaluation of question-answer pairs without needing to author unit tests for them. In addition to question-answer pairs to be evaluated, the JSONL dataset could also include the list of evaluators to be executed + the configuration for setting up LLM / Azure AI Foundry evaluation service connection. It could then internally execute the configured evaluators against each question-answer pair and produce a report containing all the evaluation scores.