New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Question] Differences in quantization logic compared to AWQ #663

Open

wenhuach21 opened this issue May 6, 2024 · 0 comments

Labels

wenhuach21 commented May 6, 2024 •

edited

If we compare with the asym quantization logic with AWQ, there are some differences, a major distinction is whether the range of min-max values should include zero.
In AWQ, zero is not included in the range, as depicted in https://github.com/mit-han-lab/llm-awq/blob/main/awq/quantize/quantizer.py#L74,
whereas GPTQ does include zero, as demonstrated in https://github.com/AutoGPTQ/AutoGPTQ/blob/main/auto_gptq/quantization/quantizer.py#L64.

Since intel/auto-round also follows AutoGPTQ, and the way used by AWQ is better I think

So my questions is does the kernel of AutoGPTQ support not including zero or AutoGPTQ will support this?

wenhuach21 added the enhancement label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment