Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
Updated
Jun 6, 2024 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Grounded Multimodal Large Language Model with Localized Visual Tokenization
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don't let the poverty limit your imagination! Train your own 14B LLaVA-like MLLM on RTX3090/4090 24GB.
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
A collection of visual instruction tuning datasets.
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."