vision-and-language

Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"

reinforcement-learning multi-agent vision-and-language

Updated Mar 27, 2024
Python

mltrev23 / plate-yolov5m

Star

Training and inferencing model to extract license number plate

ai vision vision-and-language yolov5

Updated Apr 22, 2024
Python

michelecafagna26 / vl-ablation

Star

Targeted semantic multimodal input ablation. Official implementation of the ablation method introduced in the paper: "What Vision-Language Models 'See' when they See Scenes"

semantic tools vl interpretability occlusion multimodal xai vision-and-language ablation

Updated Mar 23, 2024
Jupyter Notebook

davidandym / Multi-Task-Optimization

Star

A (hopefully) relatively straightforward, easy to modify code base for running a variety of multi-task optimization setups, with a focus on gradient aggregation methods.

deep-learning multi-task-learning vision-and-language

Updated Oct 7, 2022

claromes / toolazytowritealt

Sponsor

Star

alt text for lazy people

clarifai language-model vision-and-language streamlit blip-2 llm-hackathon

Updated Oct 5, 2023
Python

williamcfrancis / vlm-comparison-gemini-cog

Star

A comparitive study between the two of the best performing open source Vision Language Models - Google Gemini Vision and CogVLM

ai gemini vision vlm vision-and-language vision-language-model cogvlm google-gemini gemini-pro

Updated Jan 28, 2024
Python

devashish-gupta / Instruct-Nav

Star

A multimodal model for language-guided socially compliant robot navigation.

robotics navigation vlm vision-and-language omniverse instruction-tuning

Updated Apr 27, 2024
Jupyter Notebook

camiloavil / AI-Vision-Language-Transformer-API

Star

an API built on FastAPI for visual question answering. It's open source

api dockerfile docker-compose python3 vision-and-language fastapi huggingface-transformers

Updated Sep 8, 2023
Dockerfile

JChiyah / exploring-mm-in-simmc2

Star

Code and models for the paper 'Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge' published at AAAI 2022 DSTC10 Workshop

machine-learning coreference-resolution vision-and-language ambiguity-resolver bert-models dstc10

Updated Nov 23, 2022
Python

esradonmez / VisLang-Paper-Club

Star

Reading group for Vision and Language research

machine-learning vl multimodal-learning vision-and-language

Updated Jan 23, 2022

michelecafagna26 / vl-shap

Star

[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision and Language Generative Models with Semantic Visual Priors"

semantic vl stego explanations interpretable-ai explainable-ai xai vision-and-language multimodal-deep-learning shap vision-language explainable-machine-learning generative-ai vl-shap

Updated Nov 26, 2023
Jupyter Notebook

ellenzhuwang / implicit_vkood

Star

An end-to-end multimodal framework incorporating explicit knowledge graphs and OOD-detection. (NeurIPS23)

knowledge-graph multimodal vision-and-language multimodal-deep-learning ood-detection implicit-differentiation neurips-2023

Updated Jan 23, 2024
Python

kassy11 / Awesome_Visually-Augmented_NLP

Star

🖼️Latest Papers on Visually(Imagination)-Augmented NLP

nlp awesome multimodality dialogue-generation multimodal vision-and-language multimodal-deep-learning multimodal-fusion dialogue-system multimodal-dialogue llm mllm

Updated Jan 19, 2024

guoyang9 / ELIP

Star

Efficient language image pre-training

efficient vision-and-language

Updated Dec 12, 2023
Python

LivXue / VCNLG

Star

Vision-Controllable Natural Language Generation

natural-language-generation vision-and-language natual-language-processing

Updated Mar 8, 2024
Python

vyskocj / VinVL-L

Star

VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)

computer-vision visual-question-answering vision-and-language location-recognition

Updated Jan 26, 2023
Python

[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.

dataset image-captioning image2text vision-and-language multimodal-data huggingface-datasets multimodal-grounding

Updated Nov 13, 2023

Improve this page

Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision-and-language

Here are 220 public repositories matching this topic...

tanmaybinaykiya / CS231N-CNN-Solutions

phiyodr / plxmert

alsudais / ImageNet_to_AWN

clp-research / cost-sharing-reference-game

mltrev23 / plate-yolov5m

michelecafagna26 / vl-ablation

davidandym / Multi-Task-Optimization

claromes / toolazytowritealt

williamcfrancis / vlm-comparison-gemini-cog

devashish-gupta / Instruct-Nav

camiloavil / AI-Vision-Language-Transformer-API

JChiyah / exploring-mm-in-simmc2

esradonmez / VisLang-Paper-Club

michelecafagna26 / vl-shap

ellenzhuwang / implicit_vkood

kassy11 / Awesome_Visually-Augmented_NLP

guoyang9 / ELIP

LivXue / VCNLG

vyskocj / VinVL-L

michelecafagna26 / HL-dataset

Improve this page

Add this topic to your repo