We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.\build\bin\Release\main.exe -m .\ReluLLaMA-70B-PowerInfer-GGUF\llama-70b-relu.q4.powerinfer.gguf -n 128 -t 32 -p "Once upon a time" 我用这段命令试了一下效果,速度很慢而且CPU和内存占用很大,我检查了一下输出信息 llm_load_sparse_model_tensors: offloaded layers from VRAM budget(-2147483648 bytes): 81/80 llm_load_sparse_model_tensors: mem required = 40226.35 MB llm_load_sparse_model_tensors: VRAM used: 9842.91 MB 我的4090的24G显存显然只占用了一半 llama_new_context_with_model: compute buffer total size = 14.50 MB llama_new_context_with_model: VRAM scratch buffer: 12.94 MB llama_new_context_with_model: total VRAM used: 10015.84 MB (model: 9842.91 MB, context: 172.94 MB) 这里也显示占用显存为10G
The text was updated successfully, but these errors were encountered:
So does this question has been solved?
Sorry, something went wrong.
No branches or pull requests
.\build\bin\Release\main.exe -m .\ReluLLaMA-70B-PowerInfer-GGUF\llama-70b-relu.q4.powerinfer.gguf -n 128 -t 32 -p "Once upon a time"
我用这段命令试了一下效果,速度很慢而且CPU和内存占用很大,我检查了一下输出信息
llm_load_sparse_model_tensors: offloaded layers from VRAM budget(-2147483648 bytes): 81/80
llm_load_sparse_model_tensors: mem required = 40226.35 MB
llm_load_sparse_model_tensors: VRAM used: 9842.91 MB
我的4090的24G显存显然只占用了一半
llama_new_context_with_model: compute buffer total size = 14.50 MB
llama_new_context_with_model: VRAM scratch buffer: 12.94 MB
llama_new_context_with_model: total VRAM used: 10015.84 MB (model: 9842.91 MB, context: 172.94 MB)
这里也显示占用显存为10G
The text was updated successfully, but these errors were encountered: