Skip to content

Latest commit

 

History

History
11 lines (6 loc) · 618 Bytes

common_pool.md

File metadata and controls

11 lines (6 loc) · 618 Bytes

CommonPool

CommonPool is a dataset with 12.8 billion image-text pairs collected from Common Crawl, and is part of DataComp, a benchmark for designing multimodal datasets. See http://datacomp.ai/ and https://arxiv.org/abs/2304.14108 for details.

Along with the largest pool with 12.8B samples, CommonPool also comes in three smaller versions, containing 12.8M, 128M, and 1.28B samples.

Downloading CommonPool

CommonPool can be downloaded using img2dataset by following the instructions on https://github.com/mlfoundations/datacomp/blob/main/download_upstream.py