We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version: llama.cpp@f679349, tag: b1708 Model: Baichuan2-13B-Chat GPU: T4
I want to deploy Baichuan2 with a proper quantization, but the perplexity testing shows a confusing result. Advice needed please.
# llama.cpp@f679349 compiled with `-DLLAMA_CUBLAS=1 -DLLAMA_CUDA_MMV_Y=4' /app/convert-hf-to-gguf.py --outfile f16 /models/Baichuan2-13B-Chat for quant in f16 Q4_0 Q4_1 Q4_K_S Q4_K_M Q5_0 Q5_1 Q5_K_S Q5_K_M Q6_K Q8_0 do [ $quant = f16 ] || /app/quantize f16 $quant $quant /app/perplexity -m $quant -f /path/to/wikitext-2-raw/wiki.test.raw -ngl 64 done
Surprisingly, Qx_0 is better than Qx_1 even Qx_K*, and Q6 seems worse than Q5.
Did I miss anything?
The text was updated successfully, but these errors were encountered:
Wikitext perplexity tests do not make sense for fine-tuned models - only base models
Sorry, something went wrong.
"Wikitext perplexity tests do not make sense for fine-tuned models - only base models"
Can you explain why? Because fine-tuning brings the model away from being able to generate wikitext?
Related: #7066
No branches or pull requests
Version: llama.cpp@f679349, tag: b1708
Model: Baichuan2-13B-Chat
GPU: T4
Summary
I want to deploy Baichuan2 with a proper quantization, but the perplexity testing shows a confusing result.
Advice needed please.
Step to reproduce
Result
Surprisingly, Qx_0 is better than Qx_1 even Qx_K*, and Q6 seems worse than Q5.
Did I miss anything?
The text was updated successfully, but these errors were encountered: