What is required audio length for fine tuning? #115

risqaliyevds · 2024-03-28T07:20:54Z

I split my audio into 5-10 second chunks. Is this normal for fine-tuning, or is there a specific range for audio chunks? I fine-tuned with my Uzbek language audio (approximately 30 hours, and my loss is not decreasing

vatsalaggarwal · 2024-03-30T15:00:15Z

Hey! 5-10 seconds should be enough, but note that during synthesis you'll struggle to generate more than 5-10 seconds at one time due to this...

hard to debug loss not decreasing without more info!

lucapericlp · 2024-04-03T09:25:32Z

Hey @risqaliyevds, let us know if you have anymore info or we'll look to close this issue in the next few days.

eshoyuan · 2024-05-01T22:07:02Z

I met similar problems. Both training loss and val loss is not decreasing.

lucapericlp · 2024-05-14T21:12:35Z

Could both of you provide more information w.r.t your finetuning configurations & dataset that you're using? As @vatsalaggarwal mentioned, 5-10s should be fine if thats appropriate at inference time. Are either of you able to get a finetuning working with a non-custom dataset (i.e LibriTTS, VCTK)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is required audio length for fine tuning? #115

What is required audio length for fine tuning? #115

risqaliyevds commented Mar 28, 2024

vatsalaggarwal commented Mar 30, 2024

lucapericlp commented Apr 3, 2024

eshoyuan commented May 1, 2024

lucapericlp commented May 14, 2024

What is required audio length for fine tuning? #115

What is required audio length for fine tuning? #115

Comments

risqaliyevds commented Mar 28, 2024

vatsalaggarwal commented Mar 30, 2024

lucapericlp commented Apr 3, 2024

eshoyuan commented May 1, 2024

lucapericlp commented May 14, 2024