You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@woohwan thanks for the feature request, just note that is the e2e recipe more geared toward showing the process. I wonder if you are interested in contributing a llama3 case-study?
@ HamidShojanazeri I am also interested in merging the llama 3 tokenizer with a new custom tokenizer that I trained from scratch. I understand that llama 1 and 2 tokenizers are based on sentencepiece and the current llama recipes also provide code to merge two sentencepiece tokenizers. However, llama 3 tokenizer is based on tiktoken and there no official training script available to train a tiktoken tokenizer let alone merge two of them together. Can you help with the code or point in the right direction as to how to merge two tiktoken based tokenizers? Thanks in advance
馃殌 The feature, motivation and pitch
The current multilingual recipes are for LLAMA 2.
I would like to see LLAMA 3 multilingual recipes added.
Thank you.
Alternatives
No response
Additional context
Adding multilingual tokens via huggingface tokenizer does not work.
I followed the documentation below.
https://huggingface.co/learn/nlp-course/chapter6/2
The text was updated successfully, but these errors were encountered: