-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when trying to quantize the JAIS model. #632
Comments
Without modifying AutoGPTQ code you can try this: import torch, auto_gptq
from transformers import AutoModel, AutoTokenizer
from auto_gptq.modeling._base import BaseGPTQForCausalLM
import logging
logging.basicConfig(
format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)
# define model
auto_gptq.modeling._base.SUPPORTED_MODELS = ["jais"]
class JAISLMHeadModelGPTQ(BaseGPTQForCausalLM):
layer_type = "JAISBlock"
layers_block_name = "transformer.h"
outside_layer_modules = ["transformer.ln_f", "transformer.relative_pe", "transformer.wte"]
inside_layer_modules = [
["attn.c_attn"],
["attn.c_proj"],
["mlp.c_fc", "mlp.c_fc2"],
["mlp.c_proj"],
]
#############
pretrained_model_dir = "/sdb-disk/LlmsModels/jais-30b-chat-v3"
quantized_model_dir = "/sdb-disk/LlmsModels/jais-30b-chat-v3-4bit"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True, trust_remote_code=True)
examples = [
tokenizer(
"auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
)
]
quantize_config = BaseQuantizeConfig(
bits=4, # quantize model to 4-bit
group_size=128, # it is recommended to set the value to 128
desc_act=False, # set to False can significantly speed up inference but the perplexity may slightly bad
)
model = JAISLMHeadModelGPTQ.from_pretrained(pretrained_model_dir, quantize_config, trust_remote_code=True)
model.quantize(examples)
model.save_quantized(quantized_model_dir, use_safetensors=True) I have not tested this. To run the quantized model you need the model definition code in that script too. For serious quantization you should use a proper dataset that aligns with the model training better. |
Thank you @LaaZa for reaching out. I'm following your code. Finally, it's starting to quantize, but in the final step "Packing model...", I got this error: AssertionError.
|
Okay, I can't see the module shapes because the model is not in safetensors format. Try to update to the very latest auto_gptq from git, it should have a fix for the padding which might fix the issue for you. However that avg loss looks really bad, might be your limited examples tho. |
@LaaZa I built auto_gptq directly from the source using: pip install -vvv --no-build-isolation -e . This is the version of auto_gptq: 0.8.0.dev0, and for transformers: 4.39.3. These are all the packages installed:
Regarding the average loss, yes, I set a few examples, which is why it's very bad. However, I don't think this is the main issue. I will switch to real data once I'm sure the quantization is working fine. |
Do you have commit b4b801c |
@LaaZa Yes, this snippet is from my qlinear_exllama.py. |
Oh, I think it only affected outfeatures. Some of the modules do not have infeatures divisible by 32. fc and fc2 it seems like, but that's pretty bad because they are huge and should be quantized. you can try removing |
Thank you, @LaaZa . I tried to remove them, but it's not working. When I set this it works fine.
However, I still have an issue using it with vllm. Vllm already supports the JAIS model, and when I try to use the original model, it works fine. But when I use the new quantized model, it fails. This is the code:
And this is the error:
ValueError: The output size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size. |
You are now quantizing only a very small portion of the model making it almost pointless. vllm issue happens in their code so I can't help with that, obviously the model isn't very compatible with quantization. I have to give up. The model is ultimately quite niche and the devs have not worked to solve these issues or get it implemented in transformers yet. |
Thank you, @LaaZa , for your effort. I really appreciate it. |
I'm trying to quantize the JAIS model, but I received the message TypeError: JAIS isn't supported yet. This is my code:
Is there a way to quantize it? I followed the README's Customize Model section, but it's also not working
The text was updated successfully, but these errors were encountered: