Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama 3 multilingual recipe #509

Open
woohwan opened this issue May 13, 2024 · 3 comments
Open

llama 3 multilingual recipe #509

woohwan opened this issue May 13, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@woohwan
Copy link

woohwan commented May 13, 2024

馃殌 The feature, motivation and pitch

The current multilingual recipes are for LLAMA 2.
I would like to see LLAMA 3 multilingual recipes added.

Thank you.

Alternatives

No response

Additional context

Adding multilingual tokens via huggingface tokenizer does not work.

I followed the documentation below.
https://huggingface.co/learn/nlp-course/chapter6/2

@mreso mreso added the enhancement New feature or request label May 13, 2024
@HamidShojanazeri
Copy link
Contributor

@woohwan thanks for the feature request, just note that is the e2e recipe more geared toward showing the process. I wonder if you are interested in contributing a llama3 case-study?

@woohwan
Copy link
Author

woohwan commented May 15, 2024

sorry.
i'm newbie in llm field.

@savanth14
Copy link

@ HamidShojanazeri I am also interested in merging the llama 3 tokenizer with a new custom tokenizer that I trained from scratch. I understand that llama 1 and 2 tokenizers are based on sentencepiece and the current llama recipes also provide code to merge two sentencepiece tokenizers. However, llama 3 tokenizer is based on tiktoken and there no official training script available to train a tiktoken tokenizer let alone merge two of them together. Can you help with the code or point in the right direction as to how to merge two tiktoken based tokenizers? Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants