Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827

JameslaoA · 2024-05-15T12:45:50Z

System Info

transformers version : 4.38.1
platform: ubuntu 22.04
python version : 3.10.14
optimum version : 1.19.2

Who can help?

@ArthurZucker and @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

1.reference conversion command link: https://huggingface.co/docs/transformers/v4.40.1/zh/serialization
2.download model files offline (https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat/tree/main)
3.Execute transition instruction：optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/

The conversion results are as follows：
(mypy3.10_qnn) zhengjr@ubuntu-ThinkStation-P3-Tower:~$ optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/
2024-05-15 19:42:07.726433: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-15 19:42:07.916257: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-05-15 19:42:07.997974: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-15 19:42:08.545959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-05-15 19:42:08.546100: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-05-15 19:42:08.546104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Framework not specified. Using pt to export the model.
The task text-generation was manually specified, and past key values will not be reused in the decoding. if needed, please pass --task text-generation-with-past to export using the past key values.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

***** Exporting submodel 1/1: Qwen2ForCausalLM *****
Using framework PyTorch: 1.13.1
Overriding 1 configuration item(s)
- use_cache -> False
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py:300: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:126: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:309: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Saving external data to one file...
Post-processing the exported models...
Deduplicating shared (tied) weights...
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
lm_head.weight: {'onnx::MatMul_5535'}
model.embed_tokens.weight: {'model.embed_tokens.weight'}
Removing duplicate initializer onnx::MatMul_5535...

Validating ONNX model Qwen1.5-0.5B-Chat_onnx/model.onnx...
-[✓] ONNX model output names match reference model (logits)
- Validating ONNX Model output "logits":
-[✓] (2, 16, 151936) matches (2, 16, 151936)
-[x] values not close enough, max diff: 5.143880844116211e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:

logits: max diff = 5.143880844116211e-05.
The exported model was saved at: Qwen1.5-0.5B-Chat_onnx

Expected behavior

I expect the input and output tensor type converted to the onnx model to be fp16 again.

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-05-16T10:19:32Z

cc @fxmarty @michaelbenayoun for optimum

JameslaoA · 2024-05-17T01:42:55Z

@younesbelkada thanks for your help.
@fxmarty @michaelbenayoun pls help to confirm the issue,thanks.

JameslaoA · 2024-05-20T03:42:10Z

@fxmarty @michaelbenayoun can you hlep to confirm the issue？

Thanks.

JameslaoA · 2024-05-22T11:14:25Z

@younesbelkada can you help to contact the @fxmarty @michaelbenayoun to help the issue?

Thanks.

michaelbenayoun · 2024-05-22T14:10:43Z

Hi @JameslaoA , what's the issue exactly? When conversion is done the model seems to run nicely and have logits matching the original model. The inputs should be int64 no? And the outputs are int64? Are you sure? The logits seem to be computed well.

JameslaoA · 2024-05-23T03:30:34Z

Hi @michaelbenayoun thanks you for your response.

Through the model.onnx model after optimum cli conversion, open it with the netron.app tool, I see the input is int64 and the output is fp32,pls see the screenshot below

I used the following script to convert onnx to qnn lib, but the compiler failed because int64 was not supported.

Could you please help me confirm whether the reason for the problem is that the output format converted is not supported, or what is the reason? I have also raised questions to Qualcomm and requested their help.

Thanks.

JameslaoA · 2024-05-24T04:07:51Z

Hello @michaelbenayoun I expect the converted onnx model inputs to be int8/int16/int32 rather than int64.

can you help me to resolves it?

Thanks.

michaelbenayoun · 2024-05-24T07:47:40Z

So it seems that everything is fine on the export side. But you need to have int32 inputs instead of int64. I think this script could help you.

JameslaoA · 2024-05-25T08:01:13Z

Hi @michaelbenayoun thanks for your help,I'll try to verify that.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827

Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827

JameslaoA commented May 15, 2024

younesbelkada commented May 16, 2024

JameslaoA commented May 17, 2024

JameslaoA commented May 20, 2024

JameslaoA commented May 22, 2024

michaelbenayoun commented May 22, 2024

JameslaoA commented May 23, 2024 •

edited

JameslaoA commented May 24, 2024

michaelbenayoun commented May 24, 2024

JameslaoA commented May 25, 2024

Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827

Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827

Comments

JameslaoA commented May 15, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented May 16, 2024

JameslaoA commented May 17, 2024

JameslaoA commented May 20, 2024

JameslaoA commented May 22, 2024

michaelbenayoun commented May 22, 2024

JameslaoA commented May 23, 2024 • edited

JameslaoA commented May 24, 2024

michaelbenayoun commented May 24, 2024

JameslaoA commented May 25, 2024

JameslaoA commented May 23, 2024 •

edited