Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

24GB的显存只能占用12GB,CUDA占用也不到10%。但是CPU占用100%内存占用35GB #159

Open
NerounCstate opened this issue Mar 3, 2024 · 1 comment
Labels
question Further information is requested

Comments

@NerounCstate
Copy link

.\build\bin\Release\main.exe -m .\ReluLLaMA-70B-PowerInfer-GGUF\llama-70b-relu.q4.powerinfer.gguf -n 128 -t 32 -p "Once upon a time"
我用这段命令试了一下效果,速度很慢而且CPU和内存占用很大,我检查了一下输出信息
llm_load_sparse_model_tensors: offloaded layers from VRAM budget(-2147483648 bytes): 81/80
llm_load_sparse_model_tensors: mem required = 40226.35 MB
llm_load_sparse_model_tensors: VRAM used: 9842.91 MB
我的4090的24G显存显然只占用了一半
llama_new_context_with_model: compute buffer total size = 14.50 MB
llama_new_context_with_model: VRAM scratch buffer: 12.94 MB
llama_new_context_with_model: total VRAM used: 10015.84 MB (model: 9842.91 MB, context: 172.94 MB)
这里也显示占用显存为10G

@czq693497091
Copy link

So does this question has been solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants