Skip to content

Add call_with_metrics endpoint to baml-cli serve #1989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: canary
Choose a base branch
from

Conversation

DomWilliamsEE
Copy link

@DomWilliamsEE DomWilliamsEE commented May 27, 2025

My initial attempt at implementing #1887, comments welcome


Important

Add /call_with_metrics/:msg endpoint to BAML CLI server to handle calls with metrics using Collector.

  • Behavior:
    • Adds /call_with_metrics/:msg endpoint in mod.rs to handle calls with metrics.
    • Uses Collector to gather metrics and include them in the response.
  • Functions:
    • Adds baml_call_with_metrics_axum() in mod.rs to process requests to the new endpoint.
    • Modifies baml_call() to accept BamlCallCollector and include metrics in the response if applicable.
  • Dependencies:
    • Updates either crate in Cargo.toml and Cargo.lock to include serde feature.

This description was created by Ellipsis for cd11c9b. You can customize this summary. It will automatically update as commits are pushed.

Copy link

vercel bot commented May 27, 2025

@DomWilliamsEE is attempting to deploy a commit to the Gloo Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to cd11c9b in 1 minute and 42 seconds. Click for details.
  • Reviewed 166 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. engine/baml-runtime/src/cli/serve/mod.rs:176
  • Draft comment:
    Consider adding documentation to the BamlCallCollector enum to clarify its purpose and how it affects response behavior (e.g. metrics collection).
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
2. engine/baml-runtime/src/cli/serve/mod.rs:205
  • Draft comment:
    The 'enforce_auth' function treats an unset 'BAML_PASSWORD' as a reason to skip authentication (returning NoEnforcement). Verify that this behavior is acceptable for your security requirements.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
3. engine/baml-runtime/src/cli/serve/mod.rs:377
  • Draft comment:
    The nested match in the 'baml_call' response handling (especially for the WithCollector branch) is a bit dense. Consider refactoring this logic into a helper function to improve readability and maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
4. engine/baml-runtime/src/cli/serve/mod.rs:586
  • Draft comment:
    There's a TODO comment indicating that streaming is broken in 'baml_stream_axum2'. Consider either opening an issue or revisiting the streaming implementation to ensure the response is returned promptly.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
5. engine/baml-runtime/src/cli/serve/mod.rs:713
  • Draft comment:
    In the 'parse_args' function, the variable 'args' is created and then shadowed by a conversion to an IndexMap. Using distinct variable names may improve clarity of these conversion steps.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
6. engine/baml-runtime/src/cli/serve/mod.rs:820
  • Draft comment:
    The test 'test_parse_baml_options_with_env' directly sets environment variables which may cause side effects. Consider using a scoped helper for environment variables to isolate test changes.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
7. engine/baml-runtime/src/cli/serve/mod.rs:375
  • Draft comment:
    Typographical suggestion: Consider changing "parse-able" to "parsable" in the comment for improved readability.
  • Reason this comment was not posted:
    Comment was on unchanged code.
8. engine/baml-runtime/src/cli/serve/mod.rs:379
  • Draft comment:
    There is a minor spacing issue: consider adding a space between the closing parenthesis and the opening brace (i.e., BamlCallCollector::WithCollector(collector) {) for consistency.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_gDpYLtNjw4jNJIze

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@hellovai hellovai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the contribution here! But I'm not sure if this will work.

Currently the collector lives in memory so sending it across the wire won't solve the issue as it can't be serialized.

That said if you are able to add some tests and show this working happy to merge it in!

You also likely need to update the generated code we have.

@DomWilliamsEE
Copy link
Author

Those types do derive Serialize, I tested locally that it does work with baml-cli serve:

$ curl   -X POST   "http://localhost:2024/call_with_metrics/ExtractResume"   -H "Content-Type: application/json"   -d '{"resume": "      Vaibhav Gupta [email protected] Experience: - Founder at BoundaryML - CV Engineer at Google - CV Engineer at Microsoft Skills: - Rust - C++"}'
{
  "Right": {
    "response": {
      "name": "Vaibhav Gupta",
      "email": "[email protected]",
      "experience": [
        "Founder at BoundaryML",
        "CV Engineer at Google",
        "CV Engineer at Microsoft"
      ],
      "skills": [
        "Rust",
        "C++"
      ]
    },
    "metrics": {
      "usage": {
        "input_tokens": 101,
        "output_tokens": 75
      },
      "timing": {
        "start_time_utc_ms": 1748353793893,
        "duration_ms": 1910,
        "time_to_first_parsed_ms": null
      }
    }
  }
}

I will try to add a test for it

@hellovai
Copy link
Contributor

sorry let me rephrase! I was on my phone earlier!

What i meant was we should likely write a small design doc for what things we expose (raw prompt, raw http response, tokens, etc) before we go about this change and then update both:

  1. the API endpoint we expose
  2. the codegen we do in https://github.com/BoundaryML/baml/blob/canary/engine/language_client_codegen/src/openapi.rs to provide an interface

I think the idea of exposing metrics alone is interesting, but i wonder if we should do the design here such that we expose all of it.

additionally: the current API returns it nested in a key called: Right. we likely wouldn't want that.

If you're down to write a brief spec for what you're thinking the end user behavior should be that would be great!

Our general take as a team is that before we write end facing code, we write the end user docs for how someone would interface with it!

That said, we do appreciate the proactiveness here, just that we have a language and every new thing we add becomes another thing people have to learn, so writing the docs first helps us ensure we are on track with establishing clarity.

The docs are in the fern folder. to run them locally you can do this:

cd fern
pnpm install
pnpm dev

@DomWilliamsEE
Copy link
Author

DomWilliamsEE commented May 29, 2025

Ah right, makes sense, thanks for the detail. I will write some docs about how it could look and be used. This first version and its (lack of) design was mainly to get the ball rolling and determine if its technically possible.
Big derp on the "Right" in the response!

@DomWilliamsEE
Copy link
Author

I wrote up a little proposal in the original issue 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants