Memory saving loading weight for non-quant models #56

KaneGreen · 2024-04-05T05:45:23Z

Trying to fix #51
And this also increase the speed of loading weights. (in my computer, about 1min vs 2min)
Tested on 1.1-7b-it and 7b-it model.

but:

This method is not suitable for the int8 data type, so the original loading method is still used when using the quant model.
The new loading method will automatically reset the requires_grad of nn.Parameter in Linear and Embedding to True after the loading is completed. (I don't know why some nn.Parameters in model.py have requires_grad as False and others as default True) But I think, this shouldn't affect since forward function of GemmaForCausalLM has @torch.no_grad().

google-cla · 2024-04-05T05:45:28Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

KaneGreen · 2024-04-13T08:16:03Z

@pengchongjin @michaelmoynihan Any idea on this PR?

pengchongjin · 2024-04-15T18:18:10Z

Thanks, @KaneGreen ! Coould you please sign the CLA in order to pass the pre-check?

KaneGreen · 2024-04-16T00:16:47Z

@pengchongjin I've signed that. But it doesn't update. Any way to re-run this check?

KaneGreen · 2024-04-17T12:36:16Z

@pengchongjin CLA has been signed

KaneGreen added 2 commits April 5, 2024 13:26

load_state_dict with assign=True

b68e9e8

remove unused import

7a0fde1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory saving loading weight for non-quant models #56

Memory saving loading weight for non-quant models #56

KaneGreen commented Apr 5, 2024 •

edited

google-cla bot commented Apr 5, 2024

KaneGreen commented Apr 13, 2024

pengchongjin commented Apr 15, 2024

KaneGreen commented Apr 16, 2024

KaneGreen commented Apr 17, 2024

Memory saving loading weight for non-quant models #56

Are you sure you want to change the base?

Memory saving loading weight for non-quant models #56

Conversation

KaneGreen commented Apr 5, 2024 • edited

google-cla bot commented Apr 5, 2024

KaneGreen commented Apr 13, 2024

pengchongjin commented Apr 15, 2024

KaneGreen commented Apr 16, 2024

KaneGreen commented Apr 17, 2024

KaneGreen commented Apr 5, 2024 •

edited