Skip to content

TheoCoombes/crawlingathome-server

Repository files navigation

NOTE: This repo has now been rewritten into a general purpose distributed compute job manager, see below:

LAION-5B Tracker Server

Discord Chat

A server powering Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Installation

  1. Install requirements
git clone https://github.com/TheoCoombes/crawlingathome-server
cd crawlingathome-server
pip install -r requirements.txt
  1. Setup Redis
    • Redis Guide
    • Configure your Redis connection url in config.py.
  2. Setup SQL database
    • PostGreSQL Guide - follow steps 1-4, naming your database crawlingathome.
    • Install the required python library for the database you are using. (see link above)
    • Configure your SQL connection url in config.py.
    • In the crawlingathome-server folder, create a new folder named 'jobs', and download this file there.
    • Also create two files there, named closed.json, open_gpu.json with the text [] stored in both.
    • Also create an extra file there named leaderboard.json, with the text {} stored.
    • Finally, create another file there named shard_info.json with the text {"directory": "https://commoncrawl.s3.amazonaws.com/", "format": ".gz", "total_shards": 8569338} stored.
    • You can then run update_db.py to setup the jobs database. (this may take a while)
  3. Install ASGI server
    • From v3.0.0, you are required to start the server using a console command directly from the server backend.
    • You can either use gunicorn or uvicorn. Currently, the main production server uses uvicorn with 12 worker processes.
    • e.g. uvicorn main:app --host 0.0.0.0 --port 80 --workers 12

Usage

As stated in step 4 of installation, you need to run the server using a console command directly from the ASGI server platform:

uvicorn main:app --host 0.0.0.0 --port 80 --workers 12
  • Runs the server through Uvicorn, using 12 processes.

About

A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published