Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect. #1407

Open
kranipa opened this issue Mar 21, 2024 · 7 comments
Assignees

Comments

@kranipa
Copy link

kranipa commented Mar 21, 2024

Loading saved model runs into following error
It also takes a very long time to run and save quantized models.

2024-03-21 08:48:58 [INFO] loading weights file models/4_bit_llama2-rtn/model.safetensors
2024-03-21 08:48:58 [ERROR] Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.
2024-03-21 08:48:58 [ERROR] Saved low bit model loading failed, please check your model.

Tried following example.

import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig

model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path  = "Intel/neural-chat-7b-v3-3" 
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4, compute_dtype="int8", scale_dtype='fp32', group_size=32)
model = AutoModelForCausalLM.from_pretrained(model_path, 
                                            device_map='cpu',
                                            torch_dtype=torch.float16,
                                            quantization_config=woq_config, 
                                            trust_remote_code=True,
                                            use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)
intel-extension-for-transformers ==1.4rc2.dev8+g494a5712fa2
neural-compressor==2.4.1
neural-speed==0.4.dev21+g0ec1a6e

@intellinjun
Copy link
Collaborator

model = AutoModelForCausalLM.from_pretrained(model_path, device_map='cpu', torch_dtype=torch.float16, quantization_config=woq_config, trust_remote_code=True, _use_neural_speed=False_)
Do you want to use neural_speed? If yes, try to use neural speed = True.

@kranipa
Copy link
Author

kranipa commented Mar 26, 2024

Thank you for the response.

using use_neural_speed=True save function doesnt work.

I get following error

AttributeError: 'Model' object has no attribute 'save_pretrained'

can you share an example how to save quantized model ( Model object.) with neural_speed

@kevinintel
Copy link
Contributor

It looks like load/save mismatch, can you try to use latest commit instead of g494a5712fa2 and set use_neural_speed=False?

@kranipa
Copy link
Author

kranipa commented Mar 28, 2024

Hi, Thank you. Saving works, however loading the saved model leads to following error


    raise ValueError(
ValueError: Unknown quantization type, got rtn - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm']

following is the code snippet

import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig


model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path  = "Intel/neural-chat-7b-v3-3" 
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4)
model = AutoModelForCausalLM.from_pretrained(model_path, 
                                            device_map='cpu',
                                            #torch_dtype=torch.float16,
                                            quantization_config=woq_config, 
                                            trust_remote_code=True,
                                            use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
#load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)

@PenghuiCheng
Copy link
Collaborator

@kranipa , This issue is caused by mismatch the version of ITREX and neural-compressor. You can use neural-compressor version 2.5.1 and try it again. ITREX 1.4 is released now, Please try it. thanks very much.

@kranipa
Copy link
Author

kranipa commented Apr 15, 2024

okay , thank you.

@PhzCode
Copy link

PhzCode commented May 31, 2024

@kranipa Did you get it to run? I'm having the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants