This repository is dedicated to the "reborn-uasr" project, an initiative focused on enhancing Unsupervised Automatic Speech Recognition (ASR) through the implementation of Reinforcement Learning (RL) techniques for segmenter training.
The simplest way to access the REBORN models is through Hugging Face. We have wrapped our model including PCA dimension reduction matrix, REBORN segmenter, and REBORN generator into the Hugging Face supported form. Furthermore, we've also built the datasets corresponding to the models to Hugging Face (LibrSpeech 100 hours, Multilingual LibriSpeech across 6 languages). For those who want to have a quick start, please checkout our demo on Google Colab.
To replicate the REBORN end-to-end unsupervised phoneme recognition result, one would need:
- The upstream model (wav2vec 2.0) as feature extracter.
- The REBORN model (including the PCA dimension reduction matrix, the segmenter, and the generator).
- The corresponding dataset.
Since all of the components are available on Hugging Face, users can follow our demo on Google Colab to generate the results across different datasets by simply replacing card names of the models and datasets. Here, we summarize all the available pairings of the card names below for convenience:
Description | upstream_model_card | reborn_model_card | dataset_card | dataset_name | split |
---|---|---|---|---|---|
LibriSpeech 100 hour @ iter2-stage1 | facebook/wav2vec2-large-lv60 | andybi7676/reborn-uasr_ls100h_iter2-stage1 | andybi7676/reborn-uasr_librispeech-no-silence-100hr | {train.clean.100, dev.clean, dev.other, test.clean, test.other, dev.clean.small} | |
LibriSpeech 100 hour @ iter5-stage1 | facebook/wav2vec2-large-lv60 | andybi7676/reborn-uasr_ls100h_iter5-stage1 | andybi7676/reborn-uasr_librispeech-no-silence-100hr | {train.clean.100, dev.clean, dev.other, test.clean, test.other, dev.clean.small} | |
Multilingual LibriSpeech 100 hour German @ iter2-stage1 | facebook/wav2vec2-large-xlsr-53 | andybi7676/reborn-uasr_mls-de_iter2-stage1 | andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr | german | {train.100hr, dev, test, dev.small} |
Multilingual LibriSpeech 100 hour Dutch @ iter2-stage1 | facebook/wav2vec2-large-xlsr-53 | andybi7676/reborn-uasr_mls-de_iter2-stage1 | andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr | dutch | {train.100hr, dev, test, dev.small} |
Multilingual LibriSpeech 100 hour French @ iter2-stage1 | facebook/wav2vec2-large-xlsr-53 | andybi7676/reborn-uasr_mls-de_iter2-stage1 | andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr | french | {train.100hr, dev, test, dev.small} |
Multilingual LibriSpeech 100 hour Spanish @ iter2-stage1 | facebook/wav2vec2-large-xlsr-53 | andybi7676/reborn-uasr_mls-de_iter2-stage1 | andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr | spanish | {train.100hr, dev, test, dev.small} |
Multilingual LibriSpeech 100 hour Italian @ iter2-stage1 | facebook/wav2vec2-large-xlsr-53 | andybi7676/reborn-uasr_mls-de_iter2-stage1 | andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr | italian | {train.100hr, dev, test, dev.small} |
Multilingual LibriSpeech 100 hour Portuguese @ iter2-stage1 | facebook/wav2vec2-large-xlsr-53 | andybi7676/reborn-uasr_mls-de_iter2-stage1 | andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr | portuguese | {train.100hr, dev, test, dev.small} |
By replacing the card names, users can directly experience our pre-trained REBORN models with little efforts.
If you want to build up the environment and train the REBORN model on your own, please follow the below content first to meet the requirements.
We provide the pre-built docker image on the Docker Hub. The image contains all the dependencies for training reborn. This might be the simpliest way to setup the whole environment if you are familiar with Docker. Type the following command to pull and run the container based on the image.
docker run -it --rm --gpus all andybi7676/reborn-uasr:latest
Note that this is just an example of using the image in interactive mode with all the gpus on your machine. Feel free to use it in your own way. If the gpus are not available inside the container, please verify that nvidia-docker is installed.
In this section we are going to give instructions on how to build up the REBORN environment step by step. If you are using the reborn-uasr docker image, you can skip this section directly.
We have attach the fairseq version we use in the folder reborn-uasr/fairseq
. You can use it by cloning our repo to make sure that there is no version biases which may possibly lead to unexpected errors.
git clone https://github.com/andybi7676/reborn-uasr.git
cd reborn-uasr/fairseq
pip install -e .
Please follow the instruction from the official repo of kenlm. Please make sure that the python bindings is also installed (pip install https://github.com/kpu/kenlm/archive/master.zip
).
cd /your/path/to/reborn-uasr
pip install -r requirements.txt
Modify and run path.sh
to export fairseq and reborn-uasr to PYTHONPATH.
- Modify the /path/to/fairseq to export the corrent fairseq path into the environment.
- run
source path.sh
to appendfairseq
andreborn-uasr
into the PYTHONPATH. The result should be as follow:(base) username@desktop:/your/path/to/reborn-uasr$ source path.sh Added /your/path/to/fairseq to PYTHONPATH Appended /your/path/to/reborn-uasr to PYTHONPATH ======================================================================================= FAIRSEQ_ROOT: /your/path/to/fairseq REBORN_WORK_DIR: /your/path/to/reborn-uasr PYTHONPATH: /your/path/to/fairseq:/your/path/to/reborn-uasr Please make sure that FAIRSEQ_ROOT and REBORN_WORK_DIR are in PYTHONPATH During each runtime, please make sure to run `source path.sh` to set up the environment. ======================================================================================= Testing the required import functionality... SUCCESS
TBA
TBA
In this section, we will introduce how to train your own reborn model from scratch. Before diving into the training part, we recommend users go through the Prerequisite section and make sure that all the requirements have been satisfied.
We divide the training process into the following three main stages: wav2vec-U initialization, segmenter training, and generator (phoneme prediction model) training.
Please cite this work as:
@article{tseng2024reborn,
title={REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR},
author={Tseng, Liang-Hsuan and Hu, En-Pei and Chiang, Cheng-Han and Tseng, Yuan and Lee, Hung-yi and Lee, Lin-shan and Sun, Shao-Hua},
journal={arXiv preprint arXiv:2402.03988},
year={2024}
}