Skip to content

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

License

Notifications You must be signed in to change notification settings

andybi7676/reborn-uasr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

National Taiwan University

arXiv Open In Colab Hugging Face Collection Docker Hub

This repository is dedicated to the "reborn-uasr" project, an initiative focused on enhancing Unsupervised Automatic Speech Recognition (ASR) through the implementation of Reinforcement Learning (RL) techniques for segmenter training.

Using REBORN Models through Hugging Face 🤗

The simplest way to access the REBORN models is through Hugging Face. We have wrapped our model including PCA dimension reduction matrix, REBORN segmenter, and REBORN generator into the Hugging Face supported form. Furthermore, we've also built the datasets corresponding to the models to Hugging Face (LibrSpeech 100 hours, Multilingual LibriSpeech across 6 languages). For those who want to have a quick start, please checkout our demo on Google Colab.

Summarizing the Card Names

To replicate the REBORN end-to-end unsupervised phoneme recognition result, one would need:

  • The upstream model (wav2vec 2.0) as feature extracter.
  • The REBORN model (including the PCA dimension reduction matrix, the segmenter, and the generator).
  • The corresponding dataset.

Since all of the components are available on Hugging Face, users can follow our demo on Google Colab to generate the results across different datasets by simply replacing card names of the models and datasets. Here, we summarize all the available pairings of the card names below for convenience:

Description upstream_model_card reborn_model_card dataset_card dataset_name split
LibriSpeech 100 hour @ iter2-stage1 facebook/wav2vec2-large-lv60 andybi7676/reborn-uasr_ls100h_iter2-stage1 andybi7676/reborn-uasr_librispeech-no-silence-100hr {train.clean.100, dev.clean, dev.other, test.clean, test.other, dev.clean.small}
LibriSpeech 100 hour @ iter5-stage1 facebook/wav2vec2-large-lv60 andybi7676/reborn-uasr_ls100h_iter5-stage1 andybi7676/reborn-uasr_librispeech-no-silence-100hr {train.clean.100, dev.clean, dev.other, test.clean, test.other, dev.clean.small}
Multilingual LibriSpeech 100 hour German @ iter2-stage1 facebook/wav2vec2-large-xlsr-53 andybi7676/reborn-uasr_mls-de_iter2-stage1 andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr german {train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Dutch @ iter2-stage1 facebook/wav2vec2-large-xlsr-53 andybi7676/reborn-uasr_mls-de_iter2-stage1 andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr dutch {train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour French @ iter2-stage1 facebook/wav2vec2-large-xlsr-53 andybi7676/reborn-uasr_mls-de_iter2-stage1 andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr french {train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Spanish @ iter2-stage1 facebook/wav2vec2-large-xlsr-53 andybi7676/reborn-uasr_mls-de_iter2-stage1 andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr spanish {train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Italian @ iter2-stage1 facebook/wav2vec2-large-xlsr-53 andybi7676/reborn-uasr_mls-de_iter2-stage1 andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr italian {train.100hr, dev, test, dev.small}
Multilingual LibriSpeech 100 hour Portuguese @ iter2-stage1 facebook/wav2vec2-large-xlsr-53 andybi7676/reborn-uasr_mls-de_iter2-stage1 andybi7676/reborn-uasr_multilingual-librispeech-no-silence-100hr portuguese {train.100hr, dev, test, dev.small}

By replacing the card names, users can directly experience our pre-trained REBORN models with little efforts.

Prerequisite

If you want to build up the environment and train the REBORN model on your own, please follow the below content first to meet the requirements.

Docker Image (Recommended)

We provide the pre-built docker image on the Docker Hub. The image contains all the dependencies for training reborn. This might be the simpliest way to setup the whole environment if you are familiar with Docker. Type the following command to pull and run the container based on the image.

docker run -it --rm --gpus all andybi7676/reborn-uasr:latest

Note that this is just an example of using the image in interactive mode with all the gpus on your machine. Feel free to use it in your own way. If the gpus are not available inside the container, please verify that nvidia-docker is installed.

Building up the Environment from Source

In this section we are going to give instructions on how to build up the REBORN environment step by step. If you are using the reborn-uasr docker image, you can skip this section directly.

Fairseq

We have attach the fairseq version we use in the folder reborn-uasr/fairseq. You can use it by cloning our repo to make sure that there is no version biases which may possibly lead to unexpected errors.

git clone https://github.com/andybi7676/reborn-uasr.git
cd reborn-uasr/fairseq
pip install -e .

Kenlm

Please follow the instruction from the official repo of kenlm. Please make sure that the python bindings is also installed (pip install https://github.com/kpu/kenlm/archive/master.zip).

Other requirements (python packages)

cd /your/path/to/reborn-uasr
pip install -r requirements.txt

Modify and run path.sh to export fairseq and reborn-uasr to PYTHONPATH.

  1. Modify the /path/to/fairseq to export the corrent fairseq path into the environment.
  2. run source path.sh to append fairseq and reborn-uasr into the PYTHONPATH. The result should be as follow:
    (base) username@desktop:/your/path/to/reborn-uasr$ source path.sh 
    Added /your/path/to/fairseq to PYTHONPATH
    Appended /your/path/to/reborn-uasr to PYTHONPATH
    =======================================================================================
    FAIRSEQ_ROOT: /your/path/to/fairseq
    REBORN_WORK_DIR: /your/path/to/reborn-uasr
    PYTHONPATH: /your/path/to/fairseq:/your/path/to/reborn-uasr
    Please make sure that FAIRSEQ_ROOT and REBORN_WORK_DIR are in PYTHONPATH
    During each runtime, please make sure to run `source path.sh` to set up the environment.
    =======================================================================================
    Testing the required import functionality...
    SUCCESS
    

Flashlight python bindings (optional)

TBA

Pykaldi and Kaldi (optional)

TBA

Training REBORN

In this section, we will introduce how to train your own reborn model from scratch. Before diving into the training part, we recommend users go through the Prerequisite section and make sure that all the requirements have been satisfied.

We divide the training process into the following three main stages: wav2vec-U initialization, segmenter training, and generator (phoneme prediction model) training.

Data Preparation

Audio preparation

Text preparation

Stage 0: Training wav2vec-U as Initialization

Stage 1: REBORN segmenter training

Behavior Cloning

Reinforcement Learning

Stage 2: REBORN generator training

Boundary post-processing

GAN-training

Reference Repositories

Citation

Please cite this work as:

@article{tseng2024reborn,
  title={REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR},
  author={Tseng, Liang-Hsuan and Hu, En-Pei and Chiang, Cheng-Han and Tseng, Yuan and Lee, Hung-yi and Lee, Lin-shan and Sun, Shao-Hua},
  journal={arXiv preprint arXiv:2402.03988},
  year={2024}
}

About

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •