Unexpected perplexity of Baichuan2 #4670

QwertyJack · 2023-12-28T15:55:33Z

Version: llama.cpp@f679349, tag: b1708
Model: Baichuan2-13B-Chat
GPU: T4

Summary

I want to deploy Baichuan2 with a proper quantization, but the perplexity testing shows a confusing result.
Advice needed please.

Step to reproduce

# llama.cpp@f679349 compiled with `-DLLAMA_CUBLAS=1 -DLLAMA_CUDA_MMV_Y=4'
/app/convert-hf-to-gguf.py --outfile f16 /models/Baichuan2-13B-Chat
for quant in f16 Q4_0 Q4_1 Q4_K_S Q4_K_M Q5_0 Q5_1 Q5_K_S Q5_K_M Q6_K Q8_0
do
    [ $quant = f16 ] || /app/quantize f16 $quant $quant
    /app/perplexity -m $quant -f /path/to/wikitext-2-raw/wiki.test.raw -ngl 64
done

Result

Quant	Size	Ppl
Fp16	27797048992	10.5931 +/- 0.10915
Q4_0	7987103456	10.6613 +/- 0.10784
Q4_1	8815396576	10.7079 +/- 0.10837
Q4_K_S	8368359136	10.7842 +/- 0.11014
Q4_K_M	8998815456	10.6721 +/- 0.10821
Q5_0	9643689696	10.5271 +/- 0.10765
Q5_1	10471982816	10.6211 +/- 0.10884
Q5_K_S	9818998496	10.6195 +/- 0.10920
Q5_K_M	10326902496	10.6085 +/- 0.10892
Q6_K	12083134144	10.6267 +/- 0.11008
Q8_0	14769311424	10.5689 +/- 0.10885

Surprisingly, Qx_0 is better than Qx_1 even Qx_K*, and Q6 seems worse than Q5.

Did I miss anything?

ggerganov · 2024-01-02T12:52:29Z

Wikitext perplexity tests do not make sense for fine-tuned models - only base models

turian · 2024-05-06T02:58:15Z

"Wikitext perplexity tests do not make sense for fine-tuned models - only base models"

Can you explain why? Because fine-tuning brings the model away from being able to generate wikitext?

Related: #7066

QwertyJack added the bug-unconfirmed label Dec 28, 2023

ggerganov closed this as completed Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected perplexity of Baichuan2 #4670

Unexpected perplexity of Baichuan2 #4670

QwertyJack commented Dec 28, 2023 •

edited

ggerganov commented Jan 2, 2024

turian commented May 6, 2024

Unexpected perplexity of Baichuan2 #4670

Unexpected perplexity of Baichuan2 #4670

Comments

QwertyJack commented Dec 28, 2023 • edited

Summary

Step to reproduce

Result

ggerganov commented Jan 2, 2024

turian commented May 6, 2024

QwertyJack commented Dec 28, 2023 •

edited