Voice Cloning Model with Zero-Shot Attention-Based TTS

The AI used in this API is the YourTTS Zero-Shot Multispeaker TTS implementation of generative audio modeling.

The paper that proposed the YourTTS model was used as a central building block of the API. YourTTS for a multilingual approach for zero-shot multi-speaker TTS which can be utilized on multilingual audio data while building on older VITS approaches.

Reference Implementations used to study TTS concepts can be found here

The Models Researched under open source as provided from Coqui

Model	URL
Speaker Encoder	link
Exp 1. YourTTS-EN(VCTK)	link
Exp 1. YourTTS-EN(VCTK) + SCL	link
Exp 2. YourTTS-EN(VCTK)-PT	link
Exp 2. YourTTS-EN(VCTK)-PT + SCL	link
Exp 3. YourTTS-EN(VCTK)-PT-FR	link
Exp 3. YourTTS-EN(VCTK)-PT-FR SCL	link
Exp 4. YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL	link

TTS Retraining Data

The audios for the MOS are available here. Also, the MOS the audios are here.

Default TTS Audio Sources:

LibriTTS (test clean): 1188, 1995, 260, 1284, 2300, 237, 908, 1580, 121 and 1089

VCTK: p261, p225, p294, p347, p238, p234, p248, p335, p245, p326 and p302

MLS Portuguese: 12710, 5677, 12249, 12287, 9351, 11995, 7925, 3050, 4367 and 1306

Citation


@ARTICLE{2021arXiv211202418C,
  author = {{Casanova}, Edresson and {Weber}, Julian and {Shulby}, Christopher and {Junior}, Arnaldo Candido and {G{\"o}lge}, Eren and {Antonelli Ponti}, Moacir},
  title = "{YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone}",
  journal = {arXiv e-prints},
  keywords = {Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing},
  year = 2021,
  month = dec,
  eid = {arXiv:2112.02418},
  pages = {arXiv:2112.02418},
  archivePrefix = {arXiv},
  eprint = {2112.02418},
  primaryClass = {cs.SD},
  adsurl = {https://ui.adsabs.harvard.edu/abs/2021arXiv211202418C},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
schemas		schemas
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
backend.py		backend.py
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

schemas

schemas

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

backend.py

backend.py

main.py

main.py

requirements.txt

requirements.txt

runtime.txt

runtime.txt

Repository files navigation

Voice Cloning Model with Zero-Shot Attention-Based TTS

The AI used in this API is the YourTTS Zero-Shot Multispeaker TTS implementation of generative audio modeling.

Reference Implementations used to study TTS concepts can be found here

The Models Researched under open source as provided from Coqui

TTS Retraining Data

Default TTS Audio Sources:

Citation

About

Releases

Packages

Languages

License

MartinMashalov/VoiceCloning

Folders and files

Latest commit

History

Repository files navigation

Voice Cloning Model with Zero-Shot Attention-Based TTS

The AI used in this API is the YourTTS Zero-Shot Multispeaker TTS implementation of generative audio modeling.

Reference Implementations used to study TTS concepts can be found here

The Models Researched under open source as provided from Coqui

TTS Retraining Data

Default TTS Audio Sources:

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages