Skip to content

attilanagy234/neural-punctuator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

neural-punctuator

Complimentary code for our paper Automatic punctuation restoration with BERT models submitted to the XVII. Conference on Hungarian Computational Linguistics.

Abstract

We present an approach for automatic punctuation restoration with BERT models for English and Hungarian. For English, we conduct our experiments on Ted Talks, a commonly used benchmark for punctuation restoration, while for Hungarian we evaluate our models on the Szeged Treebank dataset. Our best models achieve a macro-averaged F1-score of 79.8 in English and 82.2 in Hungarian.

Repository Structure

.
|-- docs
|   └── paper       # The submitted paper
|-- notebooks       # Notebooks for data preparation/preprocessing
|-- src
    └── neural_punctuator 
        ├── base            # Base classes for training Torch models
        ├── configs         # YAMl files defining the parameters of each model
        ├── models          # Torch model definitions
        ├── preprocessors   # Preprocessor class
        ├── trainers        # Train logic
        ├── utils           # Utility scripts (logging, metrics, tensorboard etc.)
        └── wrappers        # Wrapper classes for the models containing all the components needed for training/prediction

Dataset

Ted Talk dataset (English) - http://hltc.cs.ust.hk/iwslt/index.php/evaluation-campaign/ted-task.html

Szeged Treebank (Hungarian) - https://rgai.inf.u-szeged.hu/node/113

Citation

If you use our work, please cite the following paper:

@article{nagy2021automatic,
  title={Automatic punctuation restoration with bert models},
  author={Nagy, Attila and Bial, Bence and {\'A}cs, Judit},
  journal={arXiv preprint arXiv:2101.07343},
  year={2021}
}

Authors

Attila Nagy, Bence Bial, Judit Ács

Budapest University of Technology and Economics - Department of Automation and Applied Informatics

About

Complimentary code for our paper Automatic punctuation restoration with BERT models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •