Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

Closed
mr-september opened this issue May 2, 2024 · 5 comments
Closed

docs: llama.cpp/GGUF CPU offloading no longer present? #2859

mr-september opened this issue May 2, 2024 · 5 comments
Assignees
Labels
type: documentation Improvements or additions to documentation

Comments

@mr-september
Copy link

mr-september commented May 2, 2024

Pages

Success Criteria

  1. My install of jan.ai doesn't have a ~/jan/engines/nitro.json at all. It only has groq.json and openai.json.
  2. I have run multiple local GGUF models on this install already. If it was to be self-generated, it should already exist.
  3. Looking into the model.json, it does not have the ngl: 100 line at all.

Additional context
Is this feature still in jan.ai? I am trying to run models bigger than my GPU's VRAM limits.

@mr-september mr-september added the type: documentation Improvements or additions to documentation label May 2, 2024
@Van-QA
Copy link
Contributor

Van-QA commented May 3, 2024

hi @mr-september,

  1. For model.json, here is how to make modifications to the value of the setting to include ngl:
    image

  2. For nitro.json, the engines folder no longer exists, we refactored it in the Jan app. and will make modifications to docs shortly

cc: @cahyosubroto @aindrajaya @irfanpena to update 2 points mentioned above into our docs.
Note that for No.2, we will need correction for nitron.json or engines folder in everyplace in our docs, as it no longer exists.

@irfanpena
Copy link
Contributor

@Van-QA Can the parameters in the https://jan.ai/docs/built-in/llama-cpp for nitro.json be used for the settings parameters in model.json?

@Van-QA
Copy link
Contributor

Van-QA commented May 3, 2024

hi @irfanpena, all 5 parameters here can be applied to model.json:

  "ctx_len": 2048, 
  "ngl": 100,
  "cpu_threads": 1,
  "cont_batching": false,
  "embedding": false

where

"ctx_len": 2048, 
  "ngl": 100,

are more important (more impact) than the rest
Thank you

@mr-september
Copy link
Author

Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.

If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.)

@Van-QA
Copy link
Contributor

Van-QA commented May 7, 2024

Linking the issue to #2208, related to RAM/VRAM utilization.

Thanks @Van-QA, that seems to be working on my end. Feel free to close this issue any time the team deems the docs updated.

If I may suggest, could this be added into the GUI as well? Ideally with some kind of general estimation (e.g. jan.ai detects my system has 8GB VRAM and 32GB RAM, and the model size is 12GB - suggest default 50% offload, etc.)
#2859 (comment)

@Van-QA Van-QA closed this as completed May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation Improvements or additions to documentation
Projects
Status: Done
Development

No branches or pull requests

3 participants