Composition of Multimodal Language Models From Scratch
-
Updated
May 7, 2024 - Jupyter Notebook
Composition of Multimodal Language Models From Scratch
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
Awesome list for attacks on large language models.
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
A Video Chat Agent with Temporal Prior
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Unified Multi-modal IAA Baseline and Benchmark
A collection of visual instruction tuning datasets.
Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Add a description, image, and links to the mllm topic page so that developers can more easily learn about it.
To associate your repository with the mllm topic, visit your repo's landing page and select "manage topics."