multimodal

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated May 15, 2024
Python

OpenGVLab / InternVideo

Star

Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understanding video-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering masked-autoencoder temporal-action-localization contrastive-learning spatio-temporal-action-localization zero-shot-retrieval video-clip vision-transformer zero-shot-classification foundation-models instruction-tuning

Updated May 15, 2024
Python

Seeed-Projects / jetson-examples

Star

jetson-examples running AI models and applications on NVIDIA Jetson devices with one-line command.

nvidia llama gpt jetson multimodal llm jetson-orin llava llama3 jetson-examples

Updated May 15, 2024
Shell

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs

Updated May 15, 2024
Python

westlake-repl / IDvs.MoRec

Star

End-to-end Training for Multimodal Recommendation Systems

end-to-end multimodal multimodal-deep-learning image-recommendation foundation-models llm large-language-model foundation-recommendation-model text-recommendation transferable-recommendation multimodal-recommendation multimodal-recommendation-dataset llm-recommendation modality-based-recommendation

Updated May 15, 2024
Python

smalltong02 / keras-llm-robot

Star

A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.

text-to-speech chatbot gemini knowledgebase speech-to-text vectorization multimodal faiss rag milvus streamlit llm code-interpreter chatgpt pgvector fastchat

Updated May 15, 2024
Python

Jiang0903 / IGReg

Star

[IEEE TMM 2024] IGReg: Image-Geometry-Assisted Point Cloud Registration via Selective Correlation Fusion

computer-vision point-cloud-registration multimodal

Updated May 15, 2024
Python

zeyofu / BLINK_Benchmark

Star

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390

benchmark natural-language-processing ai computer-vision multimodal-learning multimodal vision-and-language

Updated May 15, 2024
Python

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

image-to-text clip text-to-image dit multimodal sora text-to-video aigc stable-diffusion controlnet llava blip2 minigpt4 sd-xl ppdiffusers eva-clip stablevideodiffusion qwen-vl

Updated May 15, 2024
Python

alanqrwang / keymorph

Star

Robust multimodal brain registration via keypoints

deep-learning neural-network pytorch affine registration robust keypoints brain interpretability multimodal

Updated May 15, 2024
Jupyter Notebook

bentoml / BentoML

Star

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 15, 2024
Python

isLinXu / paper-list

Star

autoupdate paper list

reinforcement-learning classification image-generation object-detection transfer-learning optical-flow object-tracking semantic-segmentation action-recognition audio-processing pose-estimation depth-estimation anomaly-detection multimodal scene-understanding graph-neural-networks llm

Updated May 15, 2024
Python

enricoros / big-AGI

Sponsor

Star

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated May 14, 2024
TypeScript

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 657 public repositories matching this topic...

rerun-io / rerun

haotian-liu / LLaVA

TIGER-AI-Lab / Mantis

InternLM / HuixiangDou

microsoft / unilm

xlang-ai / OSWorld

Yutong-Zhou-cv / Awesome-Text-to-Image

NVIDIA / NeMo

OpenGVLab / InternVideo

Seeed-Projects / jetson-examples

modelscope / swift

westlake-repl / IDvs.MoRec

smalltong02 / keras-llm-robot

Jiang0903 / IGReg

zeyofu / BLINK_Benchmark

PaddlePaddle / PaddleMIX

alanqrwang / keymorph

bentoml / BentoML

isLinXu / paper-list

enricoros / big-AGI

Improve this page

Add this topic to your repo