Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to save memory when loading weights? #51

Open
KaneGreen opened this issue Mar 23, 2024 · 1 comment · May be fixed by #56
Open

How to save memory when loading weights? #51

KaneGreen opened this issue Mar 23, 2024 · 1 comment · May be fixed by #56
Labels
bug Something isn't working

Comments

@KaneGreen
Copy link

KaneGreen commented Mar 23, 2024

OS: Windows 11 22631.3296
Python: 3.11.8
PyTorch: 2.2.1 (installed in conda env)
CUDA: 12.1 (installed in conda env)
NV Driver: 551.76
Gemma Model: 7b-it

I was trying to run the inference. Before I started, I have used 6GB memory and had 26GB free.

I obseved that when the code runs to the load_weights function, the memory usage went up to 98% of my total 32GB RAM, lasted for about a minute and then dropped to normal. In that time, I haven't called the to(device) function in the next line.

Form the Task Manager, at the time of high usage, I see the python.exe took about 28GB Working set, while the active private working set was about 14GB. And at that time, the page file of Windows was involved to keep the system working.

Taskmgr_NyaKIArP30

However, the 7B-it model (16bit float) should not exceed 16GB size. Allocating 28GB of memory in this process is pointless.
Remember what I said above, the memory usage eventually dropped to normal without calling to(device)? This just showed that it doesn't require that much memory.

Sorry, I don't know how Python or PyTorch manage memory. But I'm wondering if it's possible to improve this line for smoothing memory usage spikes?

@pengchongjin
Copy link
Collaborator

My guess is when calling torch.load, it creates copies of weights as temporary variables, which doubles the memory, but evenutally get gc'ed.

self.load_state_dict(
torch.load(
model_path, mmap=True, weights_only=True,
)['model_state_dict'],
strict=False,
)

Maybe one workaround could be loading weights layer by layer in sequence and gc weights immediately after the weight of a particular layer gets loaded. I think in this way, it will have less peak memory usage.

@michaelmoynihan you have investigated this before, do you have any insights?

@KaneGreen KaneGreen linked a pull request Apr 5, 2024 that will close this issue
@tilakrayal tilakrayal added the bug Something isn't working label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants