Support for Extension Server in standalone mode #5918

mathetake · 2025-05-04T22:56:34Z

Description:

Describe the desired behavior, what scenario it enables and how it
would be used.

It seems that the extension server is not supported in the standalone mode.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

I encountered this in envoyproxy/ai-gateway#599 where the core logic in EAIGW switched to make the extension server mandatory

mathetake · 2025-05-04T22:56:46Z

cc @arkodg @shawnh2

shawnh2 · 2025-05-05T09:29:27Z

This is probably related to #5767

mathetake · 2025-05-05T15:42:51Z

cool!!!

mathetake · 2025-05-09T19:34:33Z

i tried the current main branch but it seems still not working yet

arkodg · 2025-05-09T19:44:41Z

looks like

gateway/internal/cmd/server.go

Line 164 in 0752df1

if cfg.EnvoyGateway.Provider.Type == egv1a1.ProviderTypeKubernetes {

some work is left to support it, specifically for tls

gateway/internal/extension/registry/extension_manager.go

Line 276 in 0752df1

if ext.Service.TLS != nil {

@type

**Commit Message** This commit is a relatively large refactoring of internals to make Envoy AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy as well as HTTPRoute. Specifically, the main objective here to allow failover and retires to work well across multiple AIServiceBackend. One of the most notable changes in this commit is that we split the extproc's logic into two phases; one is executed at the normal router level that selects a route (as opposed to the backend selection previously) and the other as the upstream filter that performs auth and transformation. In other words, Envoy AI Gateway configures two external processing filters. As a result, users are now able to configure failover as well as the retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to HTTPRoute generated by the Envoy AI Gateway. For example, this allows us to support the case where primary cluster is an Azure OpenAI and when it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard Envoy Gateway configuration. **Background** At the Envoy configuration level, Envoy Gateway translates multiple backends in a single HTTPRoute's Rule into a single Envoy cluster whose endpoints consists of multiple Endpoint set (called `LocalityLbEndpoints` in Envoy API [1]) and each set corresponds to a Backend with priority configured. For example, very roughly speaking, the following pseudo HTTPRoute ```yaml apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata name: provider-fallback spec: rules: - backendRefs: - group: gateway.envoyproxy.io kind: Backend name: primary-backend - group: gateway.envoyproxy.io kind: Backend name: secondary-backend matches: - path: type: PathPrefix value: / ``` will be translated as, when `secondary-backend` is marked as `fallback: true` in its Backend definition ([2]): ```yaml - cluster: '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster loadAssignment: clusterName: httproute/default/provider-fallback/rule/0 endpoints: - lbEndpoints: - endpoint: address: socketAddress: address: primary.com portValue: 443 priority: 0 - lbEndpoints: - endpoint: address: socketAddress: address: secondary.com portValue: 443 priority: 1 ``` where priority is configured 0 and 1 for each primary and secondary backend. When retry or passive health check is configured, Envoy will retry or fallback into the secondary cluster. In our API, transformation as well as upstream authentication must be performed per Backend so these logic must be inserted after this endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy. For example, primary.com and secondary.com might have different API schema, authentication etc. Since Envoy has a specific HTTP filter chain that will be executed at this stage, which is called "upstream filters", if we insert the extproc that performs these logic, we can properly do authn/z and transformation in response to the retry attempts by Envoy natively. From the upstream filter level external processor's perspective, it needs to know which exactly backend is chosen by the Envoy's cluster load balancing logic. We add some additional metadata information into the endpoint with EG's extension server so that the extproc can retrieve these information. We also use the extension server to insert the upstream extproc filter since currently it's not supported by EG. These logic in our extension server can be eliminated when the corresponding functionality become available in EG ([3],[4]). **Caveats** * Due to the limitation of EG's extension server API, AIBackendService that references k8s Service cannot be supported so we have to drop the support for it. Since there's a workaround for it, it should be fine plus EG can be fixed easily so the version after the next release should be able to revive the support. * `aigw run` temporarily disabled until [5] is resolved * Infernce Extension support temporarily disabled but will be revived before the next release. [1] https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto [2] https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec [3] envoyproxy/gateway#5523 [4] envoyproxy/gateway#5351 [5] envoyproxy/gateway#5918 **Related Issues/PRs (if applicable)** Partially resolves the provider level fallbacks for #34 --------- Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-10T00:53:17Z

#5984 fixes this partially in the sense that in reality i don't think we need TLS for the standalone mode. At least that works for AIGW

mathetake added the triage label May 4, 2025

mathetake mentioned this issue May 5, 2025

feat: cross Backend failover/fallback and retry support envoyproxy/ai-gateway#599

Merged

mathetake linked a pull request May 10, 2025 that will close this issue

feat: adds support for extension server in standalone mode #5984

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Extension Server in standalone mode #5918

Support for Extension Server in standalone mode #5918

mathetake commented May 4, 2025

mathetake commented May 4, 2025

shawnh2 commented May 5, 2025

mathetake commented May 5, 2025

mathetake commented May 9, 2025

arkodg commented May 9, 2025

mathetake commented May 10, 2025

Support for Extension Server in standalone mode #5918

Support for Extension Server in standalone mode #5918

Comments

mathetake commented May 4, 2025

mathetake commented May 4, 2025

shawnh2 commented May 5, 2025

mathetake commented May 5, 2025

mathetake commented May 9, 2025

arkodg commented May 9, 2025

mathetake commented May 10, 2025