Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to quantize fine-tune LLM into GGUF format #7299

Open
dibyendubiswas1998 opened this issue May 15, 2024 · 2 comments
Open

How to quantize fine-tune LLM into GGUF format #7299

dibyendubiswas1998 opened this issue May 15, 2024 · 2 comments

Comments

@dibyendubiswas1998
Copy link

Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.

@yentur
Copy link

yentur commented May 15, 2024

You can use convert.py.

@ngxson
Copy link
Collaborator

ngxson commented May 15, 2024

You can firstly merge the qlora into the model (that will produce a new set of .safetensors files)

Then either use convert.py or convert-hf-to-gguf.py to convert the safetensors model into gguf

P/s: convert-lora-to-ggml.py is removed a while ago, so the only way to run qlora currently is to merge & convert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants