#

vision-and-language

Here are 221 public repositories matching this topic...

SHTUPLUS / GITM-MR

The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".

vision-and-language vision-and-language-pre-training vision-language-dataset vision-language-model vision-language-learning

Updated Dec 8, 2023
Python

ahmdtaha / distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

python3 pytorch unsupervised-learning vision-and-language multimodal-deep-learning self-supervised-learning vision-language contrastive-learning distributed-data-parallel vision-transformer vision-language-pretraining

Updated Sep 26, 2023
Python

camiloavil / AI-Vision-Language-Transformer-API

an API built on FastAPI for visual question answering. It's open source

api dockerfile docker-compose python3 vision-and-language fastapi huggingface-transformers

Updated Sep 8, 2023
Dockerfile

Huntersxsx / RIS-Learning-List

Related papers about Referring Image Segmentation (RIS)

image-segmentation referring-expressions vision-and-language referring-image-segmentation

Updated Dec 26, 2023

eric-ai-lab / ProbMed

"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"

evaluation vision-and-language medical-vqa medical-diagnosis llms large-multimodal-models

Updated Jun 3, 2024
Python

JHKim-snu / PGA

Under review. [IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

personalization semi-supervised-learning vision-and-language robotic-manipulation visual-grounding multi-modal-learning

Updated Mar 30, 2024
Python

Heidelberg-NLP / counting-probe

Counting dataset for Vision & Language models. Introduced in the paper "Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models". https://arxiv.org/abs/2012.12352

dataset counting multimodal vision-and-language probing-task

Updated Dec 15, 2021

vyskocj / VinVL-L

VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)

computer-vision visual-question-answering vision-and-language location-recognition

Updated Jan 26, 2023
Python

michelecafagna26 / HL-dataset

[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.

dataset image-captioning image2text vision-and-language multimodal-data huggingface-datasets multimodal-grounding

Updated Nov 13, 2023

ellenzhuwang / implicitOOD

An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.

deep-learning transformer knowledge-graph multimodal-learning mscoco-dataset visual-question-answering vision-and-language ood-detection

Updated May 3, 2024
Python

tanmaybinaykiya / CS231N-CNN-Solutions

My solutions to CS231N CNN assignments

python natural-language-processing computer-vision deep-learning pytorch cs231n-assignment vision-and-language

Updated Mar 14, 2018
Jupyter Notebook

plxmert

phiyodr / plxmert

PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".

naacl transformers vision-and-language pre-training vision-language lxmert naacl2022 unibwm

Updated Jul 20, 2022
Python

alsudais / ImageNet_to_AWN

Arabic WordNet matches for synsets in ImageNet

natural-language-processing computer-vision wordnet arabic-nlp vision-and-language acl2020 arabic-computer-vision

Updated Mar 5, 2022

clp-research / cost-sharing-reference-game

Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"

reinforcement-learning multi-agent vision-and-language

Updated Mar 27, 2024
Python

shufangxun / MAC

An end-to-end masked contrastive video-and-language pre-training framework

pytorch clip mae end-to-end-learning multimodal vision-and-language activitynet pretraining msrvtt contrastive-learning vision-transformer video-text-retrieval video-language didemo

Updated Dec 13, 2022

nicholasnouri / ai-resources

A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.

awesome awesome-list interview-questions vision-and-language notebook-jupyter large-language-models llm llms generative-ai

Updated Apr 2, 2024

LivXue / VCNLG

Vision-Controllable Natural Language Generation

natural-language-generation vision-and-language natual-language-processing

Updated Mar 8, 2024
Python

JHKim-snu / GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

robotic-arm lifelong-learning vision-and-language multimodal-deep-learning robot-manipulation iros2023

Updated Apr 23, 2024
Python

CurryYuan / ZSVG3D

[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

3d zero-shot vision-and-language visual-grounding

Updated May 26, 2024
Jupyter Notebook

kyegomez / MegaVIT

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

computer-vision artificial-intelligence multi-modal vision-and-language multi-modal-learning vision-transformer gpt4 multi-modal-fusion

Updated May 17, 2024
Python

Improve this page

Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."