-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827
Comments
cc @fxmarty @michaelbenayoun for optimum |
@younesbelkada thanks for your help. |
@fxmarty @michaelbenayoun can you hlep to confirm the issue? Thanks. |
@younesbelkada can you help to contact the @fxmarty @michaelbenayoun to help the issue? Thanks. |
Hi @JameslaoA , what's the issue exactly? When conversion is done the model seems to run nicely and have logits matching the original model. The inputs should be |
Hi @michaelbenayoun thanks you for your response. Through the model.onnx model after optimum cli conversion, open it with the netron.app tool, I see the input is int64 and the output is fp32,pls see the screenshot below I used the following script to convert onnx to qnn lib, but the compiler failed because int64 was not supported. Could you please help me confirm whether the reason for the problem is that the output format converted is not supported, or what is the reason? I have also raised questions to Qualcomm and requested their help. Thanks. |
Hello @michaelbenayoun I expect the converted onnx model inputs to be int8/int16/int32 rather than int64. can you help me to resolves it? Thanks. |
So it seems that everything is fine on the export side. But you need to have |
Hi @michaelbenayoun thanks for your help,I'll try to verify that. Thanks. |
System Info
transformers version : 4.38.1
platform: ubuntu 22.04
python version : 3.10.14
optimum version : 1.19.2
Who can help?
@ArthurZucker and @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
1.reference conversion command link: https://huggingface.co/docs/transformers/v4.40.1/zh/serialization
2.download model files offline (https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat/tree/main)
3.Execute transition instruction:optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/
The conversion results are as follows:
(mypy3.10_qnn) zhengjr@ubuntu-ThinkStation-P3-Tower:~$ optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/
2024-05-15 19:42:07.726433: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-15 19:42:07.916257: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
.2024-05-15 19:42:07.997974: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-15 19:42:08.545959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-05-15 19:42:08.546100: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-05-15 19:42:08.546104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Framework not specified. Using pt to export the model.
The task
text-generation
was manually specified, and past key values will not be reused in the decoding. if needed, please pass--task text-generation-with-past
to export using the past key values.Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
***** Exporting submodel 1/1: Qwen2ForCausalLM *****
Using framework PyTorch: 1.13.1
Overriding 1 configuration item(s)
- use_cache -> False
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py:300: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:126: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:309: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Saving external data to one file...
Post-processing the exported models...
Deduplicating shared (tied) weights...
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
lm_head.weight: {'onnx::MatMul_5535'}
model.embed_tokens.weight: {'model.embed_tokens.weight'}
Removing duplicate initializer onnx::MatMul_5535...
Validating ONNX model Qwen1.5-0.5B-Chat_onnx/model.onnx...
-[✓] ONNX model output names match reference model (logits)
- Validating ONNX Model output "logits":
-[✓] (2, 16, 151936) matches (2, 16, 151936)
-[x] values not close enough, max diff: 5.143880844116211e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:
The exported model was saved at: Qwen1.5-0.5B-Chat_onnx
Expected behavior
I expect the input and output tensor type converted to the onnx model to be fp16 again.
The text was updated successfully, but these errors were encountered: