Skip to content

Generative voice cloning model using TTS synthesis with state-of-the-art Zero-Shot Multi-Speaker functionality. An web api built with the YourTTS TTS model to clone and generate realistic audio waves

License

Notifications You must be signed in to change notification settings

MartinMashalov/VoiceCloning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Cloning Model with Zero-Shot Attention-Based TTS

The AI used in this API is the YourTTS Zero-Shot Multispeaker TTS implementation of generative audio modeling.

The paper that proposed the YourTTS model was used as a central building block of the API. YourTTS for a multilingual approach for zero-shot multi-speaker TTS which can be utilized on multilingual audio data while building on older VITS approaches.

Reference Implementations used to study TTS concepts can be found here

The Models Researched under open source as provided from Coqui

Model URL
Speaker Encoder link
Exp 1. YourTTS-EN(VCTK) link
Exp 1. YourTTS-EN(VCTK) + SCL link
Exp 2. YourTTS-EN(VCTK)-PT link
Exp 2. YourTTS-EN(VCTK)-PT + SCL link
Exp 3. YourTTS-EN(VCTK)-PT-FR link
Exp 3. YourTTS-EN(VCTK)-PT-FR SCL link
Exp 4. YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL link

TTS Retraining Data

The audios for the MOS are available here. Also, the MOS the audios are here.

Default TTS Audio Sources:

LibriTTS (test clean): 1188, 1995, 260, 1284, 2300, 237, 908, 1580, 121 and 1089

VCTK: p261, p225, p294, p347, p238, p234, p248, p335, p245, p326 and p302

MLS Portuguese: 12710, 5677, 12249, 12287, 9351, 11995, 7925, 3050, 4367 and 1306

Citation


@ARTICLE{2021arXiv211202418C,
  author = {{Casanova}, Edresson and {Weber}, Julian and {Shulby}, Christopher and {Junior}, Arnaldo Candido and {G{\"o}lge}, Eren and {Antonelli Ponti}, Moacir},
  title = "{YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone}",
  journal = {arXiv e-prints},
  keywords = {Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing},
  year = 2021,
  month = dec,
  eid = {arXiv:2112.02418},
  pages = {arXiv:2112.02418},
  archivePrefix = {arXiv},
  eprint = {2112.02418},
  primaryClass = {cs.SD},
  adsurl = {https://ui.adsabs.harvard.edu/abs/2021arXiv211202418C},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

About

Generative voice cloning model using TTS synthesis with state-of-the-art Zero-Shot Multi-Speaker functionality. An web api built with the YourTTS TTS model to clone and generate realistic audio waves

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages