LAION-5B Tracker Server

NOTE: This repo has now been rewritten into a general purpose distributed compute job manager, see below:

DistCompute Client: TheoCoombes/distcompute-client
DistCompute Tracker Server: TheoCoombes/distcompute-tracker

LAION-5B Tracker Server

A server powering Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Client Repo: TheoCoombes/crawlingathome
Worker Repo: ARKSeal/crawlingathome-worker
Live Server: http://crawlingathome.duckdns.org/

Installation

Install requirements

git clone https://github.com/TheoCoombes/crawlingathome-server
cd crawlingathome-server
pip install -r requirements.txt

Setup Redis
- Redis Guide
- Configure your Redis connection url in config.py.
Setup SQL database
- PostGreSQL Guide - follow steps 1-4, naming your database crawlingathome.
- Install the required python library for the database you are using. (see link above)
- Configure your SQL connection url in config.py.
- In the crawlingathome-server folder, create a new folder named 'jobs', and download this file there.
- Also create two files there, named closed.json, open_gpu.json with the text [] stored in both.
- Also create an extra file there named leaderboard.json, with the text {} stored.
- Finally, create another file there named shard_info.json with the text {"directory": "https://commoncrawl.s3.amazonaws.com/", "format": ".gz", "total_shards": 8569338} stored.
- You can then run update_db.py to setup the jobs database. (this may take a while)
Install ASGI server
- From v3.0.0, you are required to start the server using a console command directly from the server backend.
- You can either use gunicorn or uvicorn. Currently, the main production server uses uvicorn with 12 worker processes.
- e.g. uvicorn main:app --host 0.0.0.0 --port 80 --workers 12

Usage

As stated in step 4 of installation, you need to run the server using a console command directly from the ASGI server platform:

uvicorn main:app --host 0.0.0.0 --port 80 --workers 12

Runs the server through Uvicorn, using 12 processes.

Name		Name	Last commit message	Last commit date
Latest commit History 562 Commits
cdn		cdn
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cache.py		cache.py
config.py		config.py
main.py		main.py
models.py		models.py
name.py		name.py
requirements.txt		requirements.txt
update_db.py		update_db.py
words.json		words.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cdn

cdn

templates

templates

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

cache.py

cache.py

config.py

config.py

main.py

main.py

models.py

models.py

name.py

name.py

requirements.txt

requirements.txt

update_db.py

update_db.py

words.json

words.json

Repository files navigation

LAION-5B Tracker Server

Installation

Usage

About

Releases

Packages

Contributors 3

Languages

License

TheoCoombes/crawlingathome-server

Folders and files

Latest commit

History

Repository files navigation

LAION-5B Tracker Server

Installation

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages