Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
-
Updated
May 15, 2024 - Rust
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Video Foundation Models & Data for Multimodal Understanding
jetson-examples running AI models and applications on NVIDIA Jetson devices with one-line command.
End-to-end Training for Multimodal Recommendation Systems
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
[IEEE TMM 2024] IGReg: Image-Geometry-Assisted Point Cloud Registration via Selective Correlation Fusion
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Robust multimodal brain registration via keypoints
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
autoupdate paper list
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."