[AI Evaluation] How should the AI.Eval library be used for runtime AI evaluation?

All of the samples I've seen are just one-off test methods. Can we have an explicit sample showing how a real app might use the library?

For example, if I have a chat agent that a customer is talking to, how should I evaluate the conversation? Run evaluate after every AI response with the same iteration id so the full conversation will eventually be evaluated and previous sub-conversation evaluations will be overwritten?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AI Evaluation] How should the AI.Eval library be used for runtime AI evaluation? #6343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AI Evaluation] How should the AI.Eval library be used for runtime AI evaluation? #6343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions