Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freqs_cis in llama model should be a non-persistent buffer #316

Open
tianyu-l opened this issue May 8, 2024 · 0 comments
Open

freqs_cis in llama model should be a non-persistent buffer #316

tianyu-l opened this issue May 8, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@tianyu-l
Copy link
Contributor

tianyu-l commented May 8, 2024

Currently it is registered as a persistent buffer, because of two reasons, copied from https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/model.py#L355

# TODO persistent should be set to false, since this buffer can be recomputed.
# however, we set it to true for 2 reasons.  (1) due to pytorch/pytorch#123411,
# compile or pipeline-tracer will not correctly handle non-persistent buffers,
# so we need to fix that.  (2) if we initialize pipeline-parallel models from
# a seed checkpoint rather than calling init_weights, we need freqs_cis to be
# initialized by the checkpoint, or we need to add a separate initializer for
# just the non-persistent buffers that is called after loading checkpoints.

This issue is to track the progress on it. If (1) is fixed, and (2) seems the best solution, we can close this issue.

@tianyu-l tianyu-l added the bug Something isn't working label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant