Incorporate evaluation script and GitHub workflow #103

aprilk-ms · 2025-05-02T19:05:24Z

evaluate.py script output looks like this:

merge with main

aprilk-ms · 2025-05-02T19:42:48Z

.github/workflows/ai-evaluation.yml

+        with:
+          azure-aiproject-connection-string: ${{ vars.AZURE_EXISTING_AIPROJECT_CONNECTION_STRING || vars.AZURE_AIPROJECT_CONNECTION_STRING }}
+          deployment-name: ${{ vars.AZURE_AI_AGENT_DEPLOYMENT_NAME }}
+          agent-ids: ${{ vars.AZURE_EXISTING_AGENT_ID || vars.AZURE_AI_AGENT_ID }}


Can we make sure AZURE_EXISTING_AGENT_ID gets written to .env during startup?

aprilk-ms · 2025-05-02T19:43:12Z

evals/test-data.json

@@ -0,0 +1,10 @@
+[
+    {


TODO: Update to queries more relevant for this app

aprilk-ms · 2025-05-02T19:44:41Z

evals/evaluate.py

+    tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)
+    intent_resolution = IntentResolutionEvaluator(model_config=model_config)
+    task_adherence = TaskAdherenceEvaluator(model_config=model_config)
+    results = evaluate(


Need to make sure both the user (whoever did azd up) has access to storage, otherwise uploading to AI Foundry won't work out of the box

aprilk-ms added 3 commits May 2, 2025 03:27

Add draft evaluate script

fafdddc

Add eval github workflow

96d9123

Merge branch 'main' into aprilk/evals

9da4a4f

merge with main

aprilk-ms commented May 2, 2025

View reviewed changes

dataset updates

96f6fe0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorporate evaluation script and GitHub workflow #103

Incorporate evaluation script and GitHub workflow #103

Uh oh!

aprilk-ms commented May 2, 2025 •

edited

Loading

Uh oh!

aprilk-ms May 2, 2025

Uh oh!

aprilk-ms May 2, 2025

Uh oh!

aprilk-ms May 2, 2025

Uh oh!

Uh oh!

Incorporate evaluation script and GitHub workflow #103

Are you sure you want to change the base?

Incorporate evaluation script and GitHub workflow #103

Uh oh!

Conversation

aprilk-ms commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aprilk-ms May 2, 2025

Choose a reason for hiding this comment

Uh oh!

aprilk-ms May 2, 2025

Choose a reason for hiding this comment

Uh oh!

aprilk-ms May 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aprilk-ms commented May 2, 2025 •

edited

Loading