New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting checkpoints #551
Comments
Hi @peregilk , there isn't one yet but we will add one very soon! thanks for your patience. |
@A9isha Thanks a lot for the answer. Really looking forward to this. |
Sorry for bothering you again with this @A9isha. Do you have a rough estimate of when the HF conversion will be ready? |
Awesome. Ill give it a shot tomorrow, and report back. |
Hi @A9isha, I have a checkpoint saved in: This is a continual training of a Mistral-7b-model on a Norwegian dataset. It has by default saved checkpoints every 10k steps, I am targeting the last checkpoint. Your comments refer to running I am starting by creating and cloning a HF-repo (where I plan to place to finished files) and a tmp-directory called I made two minor changes from the documentation here:
I am not really sure what the purpose of run_name is, but set it to "test". My final command looks like this: This now runs for a couple of minutes. I see a couple of warnings that might indicate errors:
and:
a while after that the conversion however crashes with this message:
|
Right I think this is caused by a recent breaking change in the way we are generating MaxText's Orbax checkpoints. Could you please regenerate your MaxText checkpoint with the latest code (i.e., after including the PR#568), and try out the script |
@A9isha I did as you said. Deleted MaxText, recloned and resinstalled requirements. Then I tried training with exactly the same commands, with a new run name. I keep getting this. Both when initialising Gemma and Mistral:
Is this related? Or should I report as separate issue? |
But it has a default value in |
My bad. I tried replicating the experiment with my custom .yml. I did not realise that there were updates in base.yml. The model is now training, and at the first checkpoint I'll be able to test the export again. I will report my results here. Thanks @A9isha |
A quick update, @A9isha. I tried converting the 0-checkpoints that were generated at the start of training. That ran without any warnings/issues, and seems to have produced pytorch model files! Thanks! I will push to HF and test. |
@A9isha What about the tokenizers here. Any path to convert any SentencePiece .model files to Hugging Face? **** update |
Hi @A9isha. I am trying to recreate my experiments here so that I am able to convert my models to HF. My first models were trained 3 weeks ago. If I understand correctly, there are also some updates to the conversion script here, so to restart the models I also need to run My main sanity check here is if I am able to do a warm restart of the Mistral-7b model using the same tokenizer and a Norwegian c4-corpus from tfds. I am trying to use the exact same settings as eariler, though I see there are some changes to However, the result really puzzles me: The graphs should be self explanatory. I am training on v5e-128 with these parametersl: This might not be related to the checkpointing at all. Tell me if you want to open a separate issue on it. |
Hi @A9isha, does Maxtext support the other way round now? That's converting HF's Llama or Mistral weights to MaxText checkpoints. Thanks |
@peregilk Apologies for the delayed response, I was OOO for sometime.
Let me know if you were able to investigate more on this |
We have the script llama_or_mistral_ckpt.py to convert the original PyTorch Llama2 checkpoint that Meta provides into MaxText checkpoint. You can see the usage here for Llama2-7b for e.g. |
Thanks for the pointer @A9isha ! I'm still wondering if there's a direct script for converting HF's LLaMA2-like weight to MaxText weight. Since I might want to use another version of LLaMA2 trained by others hosted on HuggingFace. Thanks! |
I see, unfortunately no there isn't the conversion script at the moment. It should be a modification of llama_or_mistral_ckpt. If you are interested, please feel free to send across a PR. |
Thanks @A9isha, I'm working on it and will try to open a PR for it soon :) |
Are there any scripts available for converting trained Gemma/Llama/Mistral MaxText checkpoints to HuggingFace?
The text was updated successfully, but these errors were encountered: