docs: llama.cpp/GGUF CPU offloading no longer present? #2859

mr-september · 2024-05-02T12:19:51Z

Pages

https://jan.ai/docs/built-in/llama-cpp

Success Criteria

My install of jan.ai doesn't have a ~/jan/engines/nitro.json at all. It only has groq.json and openai.json.
I have run multiple local GGUF models on this install already. If it was to be self-generated, it should already exist.
Looking into the model.json, it does not have the ngl: 100 line at all.

Additional context
Is this feature still in jan.ai? I am trying to run models bigger than my GPU's VRAM limits.

The text was updated successfully, but these errors were encountered:

Van-QA · 2024-05-03T04:32:26Z

hi @mr-september,

For model.json, here is how to make modifications to the value of the setting to include ngl:
For nitro.json, the engines folder no longer exists, we refactored it in the Jan app. and will make modifications to docs shortly

cc: @cahyosubroto @aindrajaya @irfanpena to update 2 points mentioned above into our docs.
Note that for No.2, we will need correction for nitron.json or engines folder in everyplace in our docs, as it no longer exists.

irfanpena · 2024-05-03T07:42:35Z

@Van-QA Can the parameters in the https://jan.ai/docs/built-in/llama-cpp for nitro.json be used for the settings parameters in model.json?

Van-QA · 2024-05-03T07:49:25Z

hi @irfanpena, all 5 parameters here can be applied to model.json:

  "ctx_len": 2048, 
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false

where

"ctx_len": 2048, 
  "ngl": 100,

are more important (more impact) than the rest
Thank you

mr-september · 2024-05-04T08:40:57Z

Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.

If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.)

Van-QA · 2024-05-07T09:31:23Z

Linking the issue to #2208, related to RAM/VRAM utilization.

Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.

If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.)
#2859 (comment)

mr-september added the type: documentation Improvements or additions to documentation label May 2, 2024

Van-QA assigned irfanpena May 3, 2024

irfanpena mentioned this issue May 3, 2024

docs: Update to reflect the new app behavior janhq/docs#182

Merged

3 tasks

Van-QA closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

mr-september commented May 2, 2024 •

edited

Van-QA commented May 3, 2024 •

edited

irfanpena commented May 3, 2024

Van-QA commented May 3, 2024

mr-september commented May 4, 2024

Van-QA commented May 7, 2024 •

edited

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

Comments

mr-september commented May 2, 2024 • edited

Van-QA commented May 3, 2024 • edited

irfanpena commented May 3, 2024

Van-QA commented May 3, 2024

mr-september commented May 4, 2024

Van-QA commented May 7, 2024 • edited

mr-september commented May 2, 2024 •

edited

Van-QA commented May 3, 2024 •

edited

Van-QA commented May 7, 2024 •

edited