Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU usage #4153

Merged
merged 1 commit into from May 8, 2024
Merged

Add GPU usage #4153

merged 1 commit into from May 8, 2024

Conversation

dhiltgen
Copy link
Collaborator

@dhiltgen dhiltgen commented May 4, 2024

Help users understand how much of the model fit into their GPU without having to resort to inspecting the server log

A few examples from different systems and models

eval rate:            4.40 tokens/s
gpu usage:            1 GPU (14/27 layers) 3.2 GB (2.0 GB GPU)

eval rate:            6.64 tokens/s
gpu usage:            1 GPU (27/27 layers) 3.2 GB

eval rate:            18.44 tokens/s
gpu usage:            2 GPUs (27/33 layers) 27 GB (24 GB GPU)

eval rate:            19.58 tokens/s
gpu usage:            CPU (0/27 layers) 3.2 GB

@dhiltgen dhiltgen force-pushed the gpu_verbose_response branch 5 times, most recently from 284a45f to 68fd3eb Compare May 7, 2024 16:28
llm/server.go Outdated Show resolved Hide resolved
@mxyng
Copy link
Contributor

mxyng commented May 7, 2024

#4190 broke lint on windows. gofmt is still a problem

This records more GPU usage information for eventual UX inclusion.
@dhiltgen
Copy link
Collaborator Author

dhiltgen commented May 8, 2024

Still chewing on the optimal UX, so I've removed the UX from this PR to lay the groundwork for a follow up PR to expose it in the UX.

@dhiltgen dhiltgen changed the title Add GPU usage to verbose metrics Add GPU usage May 8, 2024
@dhiltgen dhiltgen merged commit ee49844 into ollama:main May 8, 2024
15 checks passed
@dhiltgen dhiltgen deleted the gpu_verbose_response branch May 8, 2024 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants