feat(multimodal): Video understanding

It should be possible now to expand the vision support to understand videos, there are projects like
https://github.com/Efficient-Large-Model/VILA
https://github.com/LLaVA-VL/LLaVA-NeXT
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct?s=09

which make this possible nowadays. Since OpenAI has announced GPT4o, makes sense start looking into open solutions that we can plug into the API with specific backends.

llama.cpp: https://github.com/ggerganov/llama.cpp/pull/9165
vLLM: https://github.com/mudler/LocalAI/issues/3670

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(multimodal): Video understanding #2318

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat(multimodal): Video understanding #2318

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions