#

mllm

Here are 38 public repositories matching this topic...

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Jun 6, 2024
Python

X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

android agent harmony ios app gui automation mobile copilot multimodal mobile-agents mllm multimodal-large-language-models gpt4v multimodal-agent

Updated Jun 7, 2024
Python

InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated Jun 7, 2024
Python

atfortes / Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

awesome prompt question-answering gpt papers language-models reasoning cot multimodal gpt-4 in-context-learning prompt-engineering chain-of-thought chatgpt mllm vllm

Updated Jun 2, 2024

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

multimodal table-understanding document-understanding mllm multimodal-large-language-models chart-understanding

Updated Jun 4, 2024
Python

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Jun 6, 2024
Python

CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

sam mllm visual-instruction-tuning pixel-understanding

Updated Apr 15, 2024
Python

BradyFU / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

multimodality hallucination hallucinations large-language-models llm mllm multimodal-large-language-models

Updated Jan 12, 2024
Python

FoundationVision / Groma

Grounded Multimodal Large Language Model with Localized Visual Tokenization

llama multimodal grounding foundation-models large-language-models llm mllm vision-language-model llama2

Updated Jun 7, 2024
Python

X-PLUG / Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

benchmark video dataset chinese youku multimodal video-retrieval video-question-answering multimodal-pretraining mllm multimodal-large-language-models

Updated Jan 8, 2024
Python

Coobiw / MiniGPT4Qwen

Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don't let the poverty limit your imagination! Train your own 14B LLaVA-like MLLM on RTX3090/4090 24GB.

fine-tuning pipeline-parallelism pretraining model-parallel deepspeed mllm multimodal-large-language-models qwen

Updated Jun 7, 2024
Jupyter Notebook

gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Jun 2, 2024
Python

X-PLUG / mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

video vqa image-retrieval multimodal video-retrieval video-question-answering foundation-models multimodal-pretraining mllm mplug

Updated Jul 21, 2023
Python

Atomic-man007 / Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

nlp dataset gpt pretrained-models multimodel llm chatgpt mllm

Updated Apr 13, 2024

FoundationVision / GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

open-world object-detection multimodality open-vocabulary mllm open-vocabulary-detection

Updated Mar 25, 2024
Python

360CVGroup / SEEChat

Multimodal chatbot with computer vision capabilities integrated

chatbot gpt4 mllm

Updated May 17, 2024
Python

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Jun 4, 2024
Python

BAAI-DCAI / DataOptim

A collection of visual instruction tuning datasets.

llm mllm visual-instruction-tuning

Updated Mar 14, 2024
Python

Ahnsun / merlin

Merlin: Empowering Multimodal LLMs with Foresight Minds

Updated May 8, 2024
Python

graphic-design-ai / graphist

Official Repo of Graphist

graphic-design hlg lmm llm mllm layout-generation

Updated Apr 23, 2024

Improve this page

Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."