[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.
dataset
image-captioning
image2text
vision-and-language
multimodal-data
huggingface-datasets
multimodal-grounding
-
Updated
Nov 13, 2023