You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I split my audio into 5-10 second chunks. Is this normal for fine-tuning, or is there a specific range for audio chunks? I fine-tuned with my Uzbek language audio (approximately 30 hours, and my loss is not decreasing
The text was updated successfully, but these errors were encountered:
Could both of you provide more information w.r.t your finetuning configurations & dataset that you're using? As @vatsalaggarwal mentioned, 5-10s should be fine if thats appropriate at inference time. Are either of you able to get a finetuning working with a non-custom dataset (i.e LibriTTS, VCTK)?
I split my audio into 5-10 second chunks. Is this normal for fine-tuning, or is there a specific range for audio chunks? I fine-tuned with my Uzbek language audio (approximately 30 hours, and my loss is not decreasing
The text was updated successfully, but these errors were encountered: