How to quantize fine-tune LLM into GGUF format #7299

dibyendubiswas1998 · 2024-05-15T10:40:25Z

Hi, I fine-tune mistral-7b model for my question-answering task (after quantization in 4bit using LoRA, QLoRa).
Now I want to convert the fine-tuned LLM model into gguf format for CPU inferencing.

yentur · 2024-05-15T12:21:13Z

You can use convert.py.

ngxson · 2024-05-15T13:57:17Z

You can firstly merge the qlora into the model (that will produce a new set of .safetensors files)

Then either use convert.py or convert-hf-to-gguf.py to convert the safetensors model into gguf

P/s: convert-lora-to-ggml.py is removed a while ago, so the only way to run qlora currently is to merge & convert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantize fine-tune LLM into GGUF format #7299

How to quantize fine-tune LLM into GGUF format #7299

dibyendubiswas1998 commented May 15, 2024

yentur commented May 15, 2024

ngxson commented May 15, 2024 •

edited

How to quantize fine-tune LLM into GGUF format #7299

How to quantize fine-tune LLM into GGUF format #7299

Comments

dibyendubiswas1998 commented May 15, 2024

yentur commented May 15, 2024

ngxson commented May 15, 2024 • edited

ngxson commented May 15, 2024 •

edited