My solutions to CS231N CNN assignments
-
Updated
Mar 14, 2018 - Jupyter Notebook
My solutions to CS231N CNN assignments
PyTorch code for Finding in NAACL 2022 paper "Probing the Role of Positional Information in Vision-Language Models".
Arabic WordNet matches for synsets in ImageNet
Source code and documentation for the LREC-COLING'24 paper "Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies"
Training and inferencing model to extract license number plate
Targeted semantic multimodal input ablation. Official implementation of the ablation method introduced in the paper: "What Vision-Language Models 'See' when they See Scenes"
A (hopefully) relatively straightforward, easy to modify code base for running a variety of multi-task optimization setups, with a focus on gradient aggregation methods.
alt text for lazy people
A comparitive study between the two of the best performing open source Vision Language Models - Google Gemini Vision and CogVLM
A multimodal model for language-guided socially compliant robot navigation.
an API built on FastAPI for visual question answering. It's open source
Code and models for the paper 'Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge' published at AAAI 2022 DSTC10 Workshop
Reading group for Vision and Language research
[Frontiers in AI Journal] Implementation of the paper "Interpreting Vision and Language Generative Models with Semantic Visual Priors"
An end-to-end multimodal framework incorporating explicit knowledge graphs and OOD-detection. (NeurIPS23)
🖼️Latest Papers on Visually(Imagination)-Augmented NLP
Vision-Controllable Natural Language Generation
VinVL+L: Enriching Visual Representation with Location Context in Visual Question Answering (VQA)
[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."