Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] DeepSpeed Zero3 save_checkpoint() got empty mode_states files #132

Open
5 tasks done
mynewstart opened this issue Sep 11, 2023 · 3 comments
Open
5 tasks done
Labels
question Further information is requested

Comments

@mynewstart
Copy link

Required prerequisites

Questions

Hi,
I used the code to continue pretrain the model and used zero3 for model training. But I found my checkpoint file zero_pp_rank_*_mp_rank_00_model_states.pt is empty, the file only has model parameters name and shape, don't have the weights. Have you ever met this problem and how to fix?

Thanks!

Checklist

  • I have provided all relevant and necessary information above.
  • I have chosen a suitable title for this issue.
@mynewstart mynewstart added the question Further information is requested label Sep 11, 2023
@hmtbgc
Copy link

hmtbgc commented Sep 16, 2023

I have met the same problem and my solution is to use deepspeed zero2 instead of zero3

@mynewstart
Copy link
Author

My solution is to save checkpoints by myself or you can use zero_to_fp32

@haorannlp
Copy link

My solution is to save checkpoints by myself or you can use zero_to_fp32

@mynewstart I found my converted ckpt global_step_xxx only contains meaningful *optim_states.pt but only empty *model_states.pt. Any clues on this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants