Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gptq 4bit avg loss is large #643

Open
moseshu opened this issue Apr 18, 2024 · 3 comments
Open

gptq 4bit avg loss is large #643

moseshu opened this issue Apr 18, 2024 · 3 comments

Comments

@moseshu
Copy link

moseshu commented Apr 18, 2024

I user aotugptq convert blfloat model to in4 the avg loss is a bit larger than int 8.
Model is Mixtral-8X7B
int8 loss is almost 0.0004

it means that the loss of int4 is more serious?

INT4 
INFO - Quantizing block_sparse_moe.experts.6.w3 in layer 29/32...
2024-04-18 09:50:18 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.6.w3 in layer 29/32...
2024-04-18 09:50:19 INFO [auto_gptq.quantization.gptq] duration: 1.210331678390503
2024-04-18 09:50:19 INFO [auto_gptq.quantization.gptq] avg loss: 211.3699188232422
INFO - Quantizing block_sparse_moe.experts.7.w3 in layer 29/32...
2024-04-18 09:50:19 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.7.w3 in layer 29/32...
2024-04-18 09:50:20 INFO [auto_gptq.quantization.gptq] duration: 1.2222824096679688
2024-04-18 09:50:20 INFO [auto_gptq.quantization.gptq] avg loss: 83.82467651367188
INFO - Quantizing block_sparse_moe.experts.0.w2 in layer 29/32...
2024-04-18 09:50:38 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.0.w2 in layer 29/32...
2024-04-18 09:50:43 INFO [auto_gptq.quantization.gptq] duration: 4.365283012390137
2024-04-18 09:50:43 INFO [auto_gptq.quantization.gptq] avg loss: 19.91876983642578
INFO - Quantizing block_sparse_moe.experts.1.w2 in layer 29/32...
2024-04-18 09:50:43 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.1.w2 in layer 29/32...
2024-04-18 09:50:47 INFO [auto_gptq.quantization.gptq] duration: 4.401262521743774
2024-04-18 09:50:47 INFO [auto_gptq.quantization.gptq] avg loss: 6.792891025543213
INFO - Quantizing block_sparse_moe.experts.2.w2 in layer 29/32...
2024-04-18 09:50:47 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.2.w2 in layer 29/32...
2024-04-18 09:50:52 INFO [auto_gptq.quantization.gptq] duration: 4.381655931472778
2024-04-18 09:50:52 INFO [auto_gptq.quantization.gptq] avg loss: 36.049583435058594
INFO - Quantizing block_sparse_moe.experts.3.w2 in layer 29/32...
2024-04-18 09:50:52 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.3.w2 in layer 29/32...
2024-04-18 09:50:56 INFO [auto_gptq.quantization.gptq] duration: 4.5015199184417725
2024-04-18 09:50:56 INFO [auto_gptq.quantization.gptq] avg loss: 13.600162506103516
INFO - Quantizing block_sparse_moe.experts.4.w2 in layer 29/32...
2024-04-18 09:50:56 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.4.w2 in layer 29/32...
2024-04-18 09:51:00 INFO [auto_gptq.quantization.gptq] duration: 4.375776290893555
2024-04-18 09:51:00 INFO [auto_gptq.quantization.gptq] avg loss: 2.8602569103240967
INFO - Quantizing block_sparse_moe.experts.5.w2 in layer 29/32...
2024-04-18 09:51:00 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.5.w2 in layer 29/32...
2024-04-18 09:51:05 INFO [auto_gptq.quantization.gptq] duration: 4.481191635131836
2024-04-18 09:51:05 INFO [auto_gptq.quantization.gptq] avg loss: 53.12783432006836
INFO - Quantizing block_sparse_moe.experts.6.w2 in layer 29/32...
2024-04-18 09:51:05 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.6.w2 in layer 29/32...
2024-04-18 09:51:10 INFO [auto_gptq.quantization.gptq] duration: 4.568590402603149
2024-04-18 09:51:10 INFO [auto_gptq.quantization.gptq] avg loss: 73.41600036621094
INFO - Quantizing block_sparse_moe.experts.7.w2 in layer 29/32...
2024-04-18 09:51:10 INFO [auto_gptq.modeling._base] Quantizing block_sparse_moe.experts.7.w2 in layer 29/32...
2024-04-18 09:51:14 INFO [auto_gptq.quantization.gptq] duration: 4.3780763149261475
2024-04-18 09:51:14 INFO [auto_gptq.quantization.gptq] avg loss: 27.395843505859375
@LaaZa
Copy link
Contributor

LaaZa commented Apr 19, 2024

Make sure you have proper dataset that is similar to the training dataset.

@ehuaa
Copy link

ehuaa commented Apr 24, 2024

@moseshu Have you figure this problem out?

@Qubitium
Copy link
Contributor

Qubitium commented Apr 24, 2024

  1. From my experience, moe models are much harder to quant due to gate/routers.

  2. You need good dataset, as close to originals training as possible with high enough of nsamples. 128 for every 7B is my rule.

  3. Later layers have always had much higher losses than earlier layers.

Based on what you posted, can't tell if quant is good or not. Do ppl post quant and compare running avg loss of early vs later layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants