Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using this command(optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/) to perform onnx transformation, it is found that the tensor type of the model becomes int64. How to solve this problem? #30827

Open
2 of 4 tasks
JameslaoA opened this issue May 15, 2024 · 9 comments

Comments

@JameslaoA
Copy link

System Info

transformers version : 4.38.1
platform: ubuntu 22.04
python version : 3.10.14
optimum version : 1.19.2

Who can help?

@ArthurZucker and @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

1.reference conversion command link: https://huggingface.co/docs/transformers/v4.40.1/zh/serialization
2.download model files offline (https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat/tree/main)
3.Execute transition instruction:optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/

The conversion results are as follows:
(mypy3.10_qnn) zhengjr@ubuntu-ThinkStation-P3-Tower:~$ optimum-cli export onnx --model Qwen1.5-0.5B-Chat --task text-generation Qwen1.5-0.5B-Chat_onnx/
2024-05-15 19:42:07.726433: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-15 19:42:07.916257: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-05-15 19:42:07.997974: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-15 19:42:08.545959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-05-15 19:42:08.546100: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-05-15 19:42:08.546104: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Framework not specified. Using pt to export the model.
The task text-generation was manually specified, and past key values will not be reused in the decoding. if needed, please pass --task text-generation-with-past to export using the past key values.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using the export variant default. Available variants are:
- default: The default ONNX variant.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

***** Exporting submodel 1/1: Qwen2ForCausalLM *****
Using framework PyTorch: 1.13.1
Overriding 1 configuration item(s)
- use_cache -> False
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/optimum/exporters/onnx/model_patcher.py:300: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:126: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:290: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:297: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/home/zhengjr/anaconda3/envs/mypy3.10_qnn/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py:309: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Saving external data to one file...
Post-processing the exported models...
Deduplicating shared (tied) weights...
Found different candidate ONNX initializers (likely duplicate) for the tied weights:
lm_head.weight: {'onnx::MatMul_5535'}
model.embed_tokens.weight: {'model.embed_tokens.weight'}
Removing duplicate initializer onnx::MatMul_5535...

Validating ONNX model Qwen1.5-0.5B-Chat_onnx/model.onnx...
-[✓] ONNX model output names match reference model (logits)
- Validating ONNX Model output "logits":
-[✓] (2, 16, 151936) matches (2, 16, 151936)
-[x] values not close enough, max diff: 5.143880844116211e-05 (atol: 1e-05)
The ONNX export succeeded with the warning: The maximum absolute difference between the output of the reference model and the ONNX exported model is not within the set tolerance 1e-05:

  • logits: max diff = 5.143880844116211e-05.
    The exported model was saved at: Qwen1.5-0.5B-Chat_onnx

Expected behavior

I expect the input and output tensor type converted to the onnx model to be fp16 again.

@younesbelkada
Copy link
Contributor

cc @fxmarty @michaelbenayoun for optimum

@JameslaoA
Copy link
Author

@younesbelkada thanks for your help.
@fxmarty @michaelbenayoun pls help to confirm the issue,thanks.

@JameslaoA
Copy link
Author

@fxmarty @michaelbenayoun can you hlep to confirm the issue?

Thanks.

@JameslaoA
Copy link
Author

@younesbelkada can you help to contact the @fxmarty @michaelbenayoun to help the issue?

Thanks.

@michaelbenayoun
Copy link
Member

Hi @JameslaoA , what's the issue exactly? When conversion is done the model seems to run nicely and have logits matching the original model. The inputs should be int64 no? And the outputs are int64? Are you sure? The logits seem to be computed well.

@JameslaoA
Copy link
Author

JameslaoA commented May 23, 2024

Hi @michaelbenayoun thanks you for your response.

Through the model.onnx model after optimum cli conversion, open it with the netron.app tool, I see the input is int64 and the output is fp32,pls see the screenshot below
image

I used the following script to convert onnx to qnn lib, but the compiler failed because int64 was not supported.
image
image

Could you please help me confirm whether the reason for the problem is that the output format converted is not supported, or what is the reason? I have also raised questions to Qualcomm and requested their help.

Thanks.

@JameslaoA
Copy link
Author

Hello @michaelbenayoun I expect the converted onnx model inputs to be int8/int16/int32 rather than int64.

can you help me to resolves it?

Thanks.

@michaelbenayoun
Copy link
Member

So it seems that everything is fine on the export side. But you need to have int32 inputs instead of int64. I think this script could help you.

@JameslaoA
Copy link
Author

Hi @michaelbenayoun thanks for your help,I'll try to verify that.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants