Support V1 API #14

ilya-lavrenov · 2025-05-20T16:28:24Z

See https://github.com/vllm-project/vllm/tree/main/vllm/v1

popovaan · 2025-05-21T12:49:20Z

V1 currently supports only GPU device for Ampere or later NVIDIA GPUs: https://blog.vllm.ai/2025/01/27/v1-alpha-release.html
CPU is not supported for V1, so we can't support it yet.

ilya-lavrenov · 2025-05-21T20:53:28Z

I suppose this feature depends on backend.
V1 or V0 is just an API, which can be implemented in each backend independently. E.g. here https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/worker_v1.py we can see worker and model runner from V1 are already implemented. So, it seems we are not blocked from OpenVINO side?

popovaan · 2025-05-22T09:06:29Z

What I mean is LLMEngine fails on device check before running the worker. But I agree that generally we should be able to override this device check. I will look deeper into this.

ilya-lavrenov assigned popovaan May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support V1 API #14

Support V1 API #14

ilya-lavrenov commented May 20, 2025

popovaan commented May 21, 2025

Uh oh!

ilya-lavrenov commented May 21, 2025

Uh oh!

popovaan commented May 22, 2025

Uh oh!

Support V1 API #14

Support V1 API #14

Comments

ilya-lavrenov commented May 20, 2025

popovaan commented May 21, 2025

Uh oh!

ilya-lavrenov commented May 21, 2025

Uh oh!

popovaan commented May 22, 2025

Uh oh!