The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
-
Updated
Dec 8, 2023 - Python
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
an API built on FastAPI for visual question answering. It's open source
Related papers about Referring Image Segmentation (RIS)
An end-to-end masked contrastive video-and-language pre-training framework
A comprehensive hub for updates on generative AI research, including interviews, notebooks, and additional resources.
Vision-Controllable Natural Language Generation
Reading group for Vision and Language research
A list of research papers on knowledge-enhanced multimodal learning
ACM Multimedia 2023 - Temporal Sentence in Streaming Videos
A PyTorch implementation of TVC
Under review. [IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Counting dataset for Vision & Language models. Introduced in the paper "Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models". https://arxiv.org/abs/2012.12352
VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."