feat(multimodal): Video understanding #2318

mudler · 2024-05-13T20:40:22Z

It should be possible now to expand the vision support to understand videos, there are projects like
https://github.com/Efficient-Large-Model/VILA
https://github.com/LLaVA-VL/LLaVA-NeXT

which make this possible nowadays. Since OpenAI has announced GPT4o, makes sense start looking into open solutions that we can plug into the API with specific backends.

mudler added the enhancement New feature or request label May 13, 2024

mudler mentioned this issue May 13, 2024

[EPIC] Model support dashboard (v2) #1126

Open

89 tasks

mudler added roadmap up for grabs Tickets that no-one is currently working on labels May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(multimodal): Video understanding #2318

feat(multimodal): Video understanding #2318

mudler commented May 13, 2024 •

edited

feat(multimodal): Video understanding #2318

feat(multimodal): Video understanding #2318

Comments

mudler commented May 13, 2024 • edited

mudler commented May 13, 2024 •

edited