microsoft/Phi-3-vision-128k-instruct for Apple MLX
-
Updated
Jun 6, 2024 - Jupyter Notebook
microsoft/Phi-3-vision-128k-instruct for Apple MLX
Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models - 🔥 ICLR 2024 Spotlight - 🏆 Best Paper Award SoCal NLP 2023
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
A reading list for large models safety, security, and privacy.
[Tobig's 컨퍼런스] VLM 모델을 활용한 대화형 코디 추천 시스템
MICCAI 2024 - Disease-informed Adaptation of Vision-Language Models
This repository is part of the GSoC '24 project and demonstrates video annotation capabilities through the integration of a multimodal vision and language model with spatiotemporal analysis.
Famous Vision Language Models and Their Architectures
Python scripts to use for captioning images with VLMs
dspy with ollama and llamacpp on google colab
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."