-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
h2ogpt on ubuntu server #1591
Comments
Hi, can you give your exact run line? For CPU it can be slow when inputting large context,so you can reduce top_k_docs etc. |
Hello , thank you for answering ! this is my exact run line : python generate.py --base_model=TheBloke/Mistral-7B-Instruct-v0.2-GGUF --prompt_type=mistral --max_seq_len=4096 (note that it's too slow even if I don't introduce context (source = llm), simple requests like hello or hi messages for example ) |
Hi, same command line for me yields very fast results on CPU, but I added the top_k_docs limit. I also added the other stuff mentioned, but that wouldn't matter of just LLM chat mode.
Then I go to http://127.0.0.1:7860 I see about 2-3 tokens per second, maybe 2 words per second, on my CPU system with i9. |
I'm running h2ogpt on an Ubuntu server, you'll find attached the server specifications. However, the model execution is too slow (TheBloke/Mistral-7B-Instruct-v0.2-GGUF), and sometimes it doesn't even generate a response. Any recommendations? What exactly could be the problem?
The text was updated successfully, but these errors were encountered: