Skip to content

Support for Extension Server in standalone mode #5918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mathetake opened this issue May 4, 2025 · 6 comments · May be fixed by #5984
Open

Support for Extension Server in standalone mode #5918

mathetake opened this issue May 4, 2025 · 6 comments · May be fixed by #5984
Labels

Comments

@mathetake
Copy link
Member

Description:

Describe the desired behavior, what scenario it enables and how it
would be used.

It seems that the extension server is not supported in the standalone mode.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

I encountered this in envoyproxy/ai-gateway#599 where the core logic in EAIGW switched to make the extension server mandatory

@mathetake
Copy link
Member Author

cc @arkodg @shawnh2

@shawnh2
Copy link
Contributor

shawnh2 commented May 5, 2025

This is probably related to #5767

@mathetake
Copy link
Member Author

cool!!!

@mathetake
Copy link
Member Author

i tried the current main branch but it seems still not working yet

@arkodg
Copy link
Contributor

arkodg commented May 9, 2025

looks like

if cfg.EnvoyGateway.Provider.Type == egv1a1.ProviderTypeKubernetes {
some work is left to support it, specifically for tls
if ext.Service.TLS != nil {

mathetake added a commit to envoyproxy/ai-gateway that referenced this issue May 10, 2025
**Commit Message**

This commit is a relatively large refactoring of internals to make Envoy
AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy
as well as HTTPRoute. Specifically, the main objective here to allow
failover and retires to work well across multiple AIServiceBackend.

One of the most notable changes in this commit is that we split the
extproc's logic into two phases; one is executed at the normal router
level that selects a route (as opposed to the backend selection
previously) and the other as the upstream filter that performs auth and
transformation. In other words, Envoy AI Gateway configures two external
processing filters.

As a result, users are now able to configure failover as well as the
retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to
HTTPRoute generated by the Envoy AI Gateway. For example, this allows us
to support the case where primary cluster is an Azure OpenAI and when
it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard
Envoy Gateway configuration.

**Background**
At the Envoy configuration level, Envoy Gateway translates multiple
backends in a single HTTPRoute's Rule into a single Envoy cluster whose
endpoints consists of multiple Endpoint set (called
`LocalityLbEndpoints` in Envoy API [1]) and each set corresponds to a
Backend with priority configured. For example, very roughly speaking,
the following pseudo HTTPRoute

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata
  name: provider-fallback
spec:
  rules:
  - backendRefs:
    - group: gateway.envoyproxy.io
      kind: Backend
      name: primary-backend
    - group: gateway.envoyproxy.io
      kind: Backend
      name: secondary-backend
    matches:
    - path:
        type: PathPrefix
        value: /
```

will be translated as, when `secondary-backend` is marked as `fallback:
true` in its Backend definition ([2]):

```yaml
- cluster:
  '@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
  loadAssignment:
    clusterName: httproute/default/provider-fallback/rule/0
    endpoints:
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: primary.com
              portValue: 443
      priority: 0
    - lbEndpoints:
      - endpoint:
          address:
            socketAddress:
              address: secondary.com
              portValue: 443
      priority: 1
```

where priority is configured 0 and 1 for each primary and secondary
backend. When retry or passive health check is configured, Envoy will
retry or fallback into the secondary cluster.

In our API, transformation as well as upstream authentication must be
performed per Backend so these logic must be inserted after this
endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy.
For example, primary.com and secondary.com might have different API
schema, authentication etc. Since Envoy has a specific HTTP filter chain
that will be executed at this stage, which is called "upstream filters",
if we insert the extproc that performs these logic, we can properly do
authn/z and transformation in response to the retry attempts by Envoy
natively.

From the upstream filter level external processor's perspective, it
needs to know which exactly backend is chosen by the Envoy's cluster
load balancing logic. We add some additional metadata information into
the endpoint with EG's extension server so that the extproc can retrieve
these information. We also use the extension server to insert the
upstream extproc filter since currently it's not supported by EG. These
logic in our extension server can be eliminated when the corresponding
functionality become available in EG ([3],[4]).

**Caveats**
* Due to the limitation of EG's extension server API, AIBackendService
that references k8s Service cannot be supported so we have to drop the
support for it. Since there's a workaround for it, it should be fine
plus EG can be fixed easily so the version after the next release should
be able to revive the support.
* `aigw run` temporarily disabled until [5] is resolved
* Infernce Extension support temporarily disabled but will be revived
before the next release.

[1]
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto
[2]
https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec
[3] envoyproxy/gateway#5523
[4] envoyproxy/gateway#5351
[5] envoyproxy/gateway#5918


**Related Issues/PRs (if applicable)**

Partially resolves the provider level fallbacks for #34

---------

Signed-off-by: Takeshi Yoneda <[email protected]>
@mathetake
Copy link
Member Author

#5984 fixes this partially in the sense that in reality i don't think we need TLS for the standalone mode. At least that works for AIGW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants