Skip to content

feat(multimodal): Video understanding #2318

Closed
@mudler

Description

@mudler

It should be possible now to expand the vision support to understand videos, there are projects like
https://github.com/Efficient-Large-Model/VILA
https://github.com/LLaVA-VL/LLaVA-NeXT
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct?s=09

which make this possible nowadays. Since OpenAI has announced GPT4o, makes sense start looking into open solutions that we can plug into the API with specific backends.

llama.cpp: ggml-org/llama.cpp#9165
vLLM: #3670

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestroadmapup for grabsTickets that no-one is currently working on

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions