-
Notifications
You must be signed in to change notification settings - Fork 460
Support for Extension Server in standalone mode #5918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
This is probably related to #5767 |
cool!!! |
i tried the current main branch but it seems still not working yet |
looks like gateway/internal/cmd/server.go Line 164 in 0752df1
|
mathetake
added a commit
to envoyproxy/ai-gateway
that referenced
this issue
May 10, 2025
**Commit Message** This commit is a relatively large refactoring of internals to make Envoy AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy as well as HTTPRoute. Specifically, the main objective here to allow failover and retires to work well across multiple AIServiceBackend. One of the most notable changes in this commit is that we split the extproc's logic into two phases; one is executed at the normal router level that selects a route (as opposed to the backend selection previously) and the other as the upstream filter that performs auth and transformation. In other words, Envoy AI Gateway configures two external processing filters. As a result, users are now able to configure failover as well as the retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to HTTPRoute generated by the Envoy AI Gateway. For example, this allows us to support the case where primary cluster is an Azure OpenAI and when it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard Envoy Gateway configuration. **Background** At the Envoy configuration level, Envoy Gateway translates multiple backends in a single HTTPRoute's Rule into a single Envoy cluster whose endpoints consists of multiple Endpoint set (called `LocalityLbEndpoints` in Envoy API [1]) and each set corresponds to a Backend with priority configured. For example, very roughly speaking, the following pseudo HTTPRoute ```yaml apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata name: provider-fallback spec: rules: - backendRefs: - group: gateway.envoyproxy.io kind: Backend name: primary-backend - group: gateway.envoyproxy.io kind: Backend name: secondary-backend matches: - path: type: PathPrefix value: / ``` will be translated as, when `secondary-backend` is marked as `fallback: true` in its Backend definition ([2]): ```yaml - cluster: '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster loadAssignment: clusterName: httproute/default/provider-fallback/rule/0 endpoints: - lbEndpoints: - endpoint: address: socketAddress: address: primary.com portValue: 443 priority: 0 - lbEndpoints: - endpoint: address: socketAddress: address: secondary.com portValue: 443 priority: 1 ``` where priority is configured 0 and 1 for each primary and secondary backend. When retry or passive health check is configured, Envoy will retry or fallback into the secondary cluster. In our API, transformation as well as upstream authentication must be performed per Backend so these logic must be inserted after this endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy. For example, primary.com and secondary.com might have different API schema, authentication etc. Since Envoy has a specific HTTP filter chain that will be executed at this stage, which is called "upstream filters", if we insert the extproc that performs these logic, we can properly do authn/z and transformation in response to the retry attempts by Envoy natively. From the upstream filter level external processor's perspective, it needs to know which exactly backend is chosen by the Envoy's cluster load balancing logic. We add some additional metadata information into the endpoint with EG's extension server so that the extproc can retrieve these information. We also use the extension server to insert the upstream extproc filter since currently it's not supported by EG. These logic in our extension server can be eliminated when the corresponding functionality become available in EG ([3],[4]). **Caveats** * Due to the limitation of EG's extension server API, AIBackendService that references k8s Service cannot be supported so we have to drop the support for it. Since there's a workaround for it, it should be fine plus EG can be fixed easily so the version after the next release should be able to revive the support. * `aigw run` temporarily disabled until [5] is resolved * Infernce Extension support temporarily disabled but will be revived before the next release. [1] https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto [2] https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec [3] envoyproxy/gateway#5523 [4] envoyproxy/gateway#5351 [5] envoyproxy/gateway#5918 **Related Issues/PRs (if applicable)** Partially resolves the provider level fallbacks for #34 --------- Signed-off-by: Takeshi Yoneda <[email protected]>
#5984 fixes this partially in the sense that in reality i don't think we need TLS for the standalone mode. At least that works for AIGW |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description:
It seems that the extension server is not supported in the standalone mode.
[optional Relevant Links:]
I encountered this in envoyproxy/ai-gateway#599 where the core logic in EAIGW switched to make the extension server mandatory
The text was updated successfully, but these errors were encountered: