Skip to content

Zero-shot Entity Linking with blitz start in 3 minutes. Hard negative mining and encoder for all entities are also included in this implementation.

License

Notifications You must be signed in to change notification settings

izuna385/Zero-Shot-Entity-Linking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dual-Encoder-Based Zero-Shot Entity Linking

Quick Starts in 3 minutes

git clone https://github.com/izuna385/Zero-Shot-Entity-Linking.git
cd Zero-Shot-Entity-Linking
python -m spacy download en_core_web_sm

# ~ Multiprocessing Sentence Boundary Detection takes about 2 hours under 8 core CPUs.
sh preprocessing.sh
python3 ./src/train.py -num_epochs 1

For further speednizing to check entire script, run the following command.

python3 ./src/train.py -num_epochs 1 -debug True

also, multi-gpu is supported.

CUDA_VISIBLE_DEVICES=0,1 python3 ./src/train.py -num_epochs 1 -cuda_devices 0,1

Descriptions

  • This experiments aim to confirm whether fine-tuning pretraind BERT (more specifically, encoders for mention and entity) is effective even to the unknown domains.

Requirements

  • torch,allennlp,transformers, and faiss are required. See also requirements.txt.

  • ~3 GB CPU and ~1.1GB GPU are necessary for running script.

How to run experiments

1. Preprocessing

2. Training and Evaluate Bi-Encoder Model

  • python3 ./src/train.py

    • This script trains encoder for mention and entity.

3. Logging Each Experiment

  • See ./src/experiment_logdir/.

    • Log directory is named after when the experiment starts.

TODO

  • Preprocess with more strict sentence boundary.

LICENSE

  • MIT

About

Zero-shot Entity Linking with blitz start in 3 minutes. Hard negative mining and encoder for all entities are also included in this implementation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published