Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency Issue with Text-to-Speech Model Output #124

Open
Prajval108 opened this issue Apr 3, 2024 · 3 comments
Open

Inconsistency Issue with Text-to-Speech Model Output #124

Prajval108 opened this issue Apr 3, 2024 · 3 comments

Comments

@Prajval108
Copy link

Problem:

The text-to-speech (TTS) model is exhibiting inconsistency in its output. Every time the model is invoked, it generates different responses, which hinders the reliability and predictability of the system.

Query:

Is there a method or approach to ensure repeatability in the responses generated by the TTS model?

Additional Context:

The inconsistency issue with the TTS model output has been tested using the https://ttsdemo.themetavoice.xyz/.

Testing was conducted with the following settings:

Speech Stability: 10
Speaker Similarity: 5
Preset Voices: Bria

speech-1 : speech_1
speech-2 : speech_2

@vatsalaggarwal
Copy link
Contributor

vatsalaggarwal commented Apr 3, 2024

Yes, you can "seed" the synthesis, but we don't provide this functionality on ttsdemo.themetavoice.xyz at the moment.

@Prajval108
Copy link
Author

Despite hosting the Gradio UI with Docker and configuring the settings with seeds 0, 42, and 100, the issue of inconsistent output persists. Additionally, the output quality is also deteriorating.

Could you please advise which seed value will provide consistent responses?

@G-force78
Copy link

Try loading the latents of your best samples with fixed seed and settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants