multimodal-grounding

Here is 1 public repository matching this topic...

michelecafagna26 / HL-dataset

[INLG2023] The High-Level (HL) dataset is a Vision and Language (V&L) resource aligning object-centric descriptions from COCO with high-level descriptions crowdsourced along 3 axes: scene, action, rationale.

dataset image-captioning image2text vision-and-language multimodal-data huggingface-datasets multimodal-grounding

Updated Nov 13, 2023

Improve this page

Add a description, image, and links to the multimodal-grounding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal-grounding topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal-grounding

Here is 1 public repository matching this topic...

michelecafagna26 / HL-dataset

Improve this page

Add this topic to your repo