✨✨Latest Advances on Multimodal Large Language Models
-
Updated
Jun 7, 2024
✨✨Latest Advances on Multimodal Large Language Models
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Mixture-of-Experts for Large Vision-Language Models
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Add a description, image, and links to the large-vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the large-vision-language-model topic, visit your repo's landing page and select "manage topics."