The more I pretrain (SSL), the worse fine-tuned model gets? #9175
Unanswered
riqiang-dp
asked this question in
Q&A
Replies: 4 comments 8 replies
-
Hello, could you tried to low your learning rate or used another optimizers like AdamW ? |
Beta Was this translation helpful? Give feedback.
3 replies
-
@nithinraok can you comment |
Beta Was this translation helpful? Give feedback.
2 replies
-
Another problem I ran into is that with the same data sampling and setting, the training phase runs fine but validation fails at the contrastive loss calculation:
|
Beta Was this translation helpful? Give feedback.
1 reply
-
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm trying out SSL pretraining for ASR. I have about 50k hours of unlabeled data and 2k hours of transcribed data. With 100k hours of data, first of all I couldn't get the default dataloader to work because CPU runs out of memory within one epoch. Then I train with only contrastive loss, but after about 50k steps SSL contrastive loss kind of starts to stagnate.
I use two checkpoints one from around 40k steps and another from the end of 80k steps to finetune with labeled ASR data. The checkpoint from 80k steps converges much slower.
It seems that the more I pretrain, the worse pretrained model I get. Is it expected? What could I be doing wrong? Furthermore, this is not an isolated case, I've tried other combinations of parameters / models e.g. Conformer, Fast-conformer, and tried smaller dataset for pretraining and with smaller data the pretrained model seems to be better as well.
Beta Was this translation helpful? Give feedback.
All reactions