You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where vocab_extension_path = the path of the pretrained model.
Expected behavior
The model's tokenizer is supposed to remain intact and not start generating gibberish because I am just reloading the extact tokenizer that was used to pretrain the model.
Why I need this
I need to replace some tokens in the model's vocab while keeping the order tokens intact. If I cant keep other parts of the tokenizer intact then my replacement of tokens cannot work.
The text was updated successfully, but these errors were encountered:
Yes, that's what I'm doing.
Infact, I've been able to edit the pretrained model's tokenizer and changed the tokens inside of it.
What I found out is that just reloading the pretrained tokenizer with the change_vocab method scatters the whole decoding process.
Describe the bug
So I picked nvidia/parakeet-ctc-0.6b and I untarred the the
.nemo
file.After that, I then loaded the model and changed the the vocab this way:
Steps/Code to reproduce bug
where
vocab_extension_path
= the path of the pretrained model.Expected behavior
The model's tokenizer is supposed to remain intact and not start generating gibberish because I am just reloading the extact tokenizer that was used to pretrain the model.
Why I need this
I need to replace some tokens in the model's vocab while keeping the order tokens intact. If I cant keep other parts of the tokenizer intact then my replacement of tokens cannot work.
The text was updated successfully, but these errors were encountered: