Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exir "missing out vars" #3443

Open
antmikinka opened this issue May 1, 2024 · 7 comments
Open

exir "missing out vars" #3443

antmikinka opened this issue May 1, 2024 · 7 comments
Labels
module: coreml Issues related to Apple's Core ML delegation

Comments

@antmikinka
Copy link

My Script I ran to cause this error
python -m examples.models.llama2.export_llama --checkpoint /Users/anthonymikinka/executorch/llama-2-7b-chat/consolidated.00.pth --params /Users/anthonymikinka/executorch/llama-2-7b-chat/params.json -kv --use_sdpa_with_kv_cache --coreml --group_size 128 -qmode 8da4w -d fp32 --verbose --max_seq_length 512 -o "/Volumes/NVME 3/ExecuTorch Models"

Above this is a lot of this EdgeOpOverload, but otherwise MIL backend and default pipelines built.
Lots of ops were removed earlier on before the MIL building.
Below is some terminal code and the traceback error.

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.dequantize_per_token.default>: schema = quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::dequantize_per_token can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor']'

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.dequantize_per_token.default>: schema = quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::dequantize_per_token can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor']'

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.choose_qparams_per_token_asymmetric.default>: schema = quantized_decomposed::choose_qparams_per_token_asymmetric(Tensor input, ScalarType dtype) -> (Tensor, Tensor)' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::choose_qparams_per_token_asymmetric can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::choose_qparams_per_token_asymmetric(Tensor input, ScalarType dtype) -> (Tensor, Tensor)']'

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.quantize_per_token.default>: schema = quantized_decomposed::quantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::quantize_per_token can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::quantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype) -> Tensor']'

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.dequantize_per_token.default>: schema = quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::dequantize_per_token can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor']'

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.choose_qparams_per_token_asymmetric.default>: schema = quantized_decomposed::choose_qparams_per_token_asymmetric(Tensor input, ScalarType dtype) -> (Tensor, Tensor)' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::choose_qparams_per_token_asymmetric can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::choose_qparams_per_token_asymmetric(Tensor input, ScalarType dtype) -> (Tensor, Tensor)']'`

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.quantize_per_token.default>: schema = quantized_decomposed::quantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::quantize_per_token can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::quantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype) -> Tensor']'

INFO:root:Failed converting '<EdgeOpOverload: quantized_decomposed.dequantize_per_token.default>: schema = quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor' to its out variant with error: 'SchemaKind.out variant of operator quantized_decomposed::dequantize_per_token can't be found. We've found the schemas of all the overloads: ['quantized_decomposed::dequantize_per_token(Tensor input, Tensor scales, Tensor zero_points, int quant_min, int quant_max, ScalarType dtype, ScalarType output_dtype) -> Tensor']'


Traceback (most recent call last):
  File "/opt/anaconda3/envs/executorch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/anaconda3/envs/executorch/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/anthonymikinka/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
  File "/Users/anthonymikinka/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/anthonymikinka/executorch/examples/models/llama2/export_llama_lib.py", line 545, in export_llama
    return _export_llama(modelname, args)
  File "/Users/anthonymikinka/executorch/examples/models/llama2/export_llama_lib.py", line 869, in _export_llama
    builder = builder_exported_to_edge.to_backend(partitioners).to_executorch()
  File "/Users/anthonymikinka/executorch/examples/models/llama2/builder.py", line 319, in to_executorch
    self.export_program = self.edge_manager.to_executorch(
  File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 842, in to_executorch
    new_gm_res = p(new_gm)
  File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/fx/passes/infra/pass_base.py", line 40, in __call__
    res = self.call(graph_module)
  File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/executorch/exir/passes/__init__.py", line 422, in call
    raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'quantized_decomposed::dequantize_per_token', 'quantized_decomposed::choose_qparams_per_token_asymmetric', 'quantized_decomposed::dequantize_per_channel_group', 'quantized_decomposed::quantize_per_token'}
@cccclai
Copy link
Contributor

cccclai commented May 1, 2024

Are you trying to lower the model to CoreML by passing --coreml? We're still actively working on enabling llama2 7b with CoreML. The xnnpack backend is ready for llama2 7b model.

@antmikinka
Copy link
Author

Are you trying to lower the model to CoreML by passing --coreml? We're still actively working on enabling llama2 7b with CoreML. The xnnpack backend is ready for llama2 7b model.

@cccclai ah ok sweet thank you for letting me know!! I would have still been trying haha

Is the XNNPACK a .mlpackage? I have to build the xnnpack stuff, I just did mps and coreml.

Do you have more info on what models you have ready for CoreML?

Is there any .mlmodel/.mlpackage model configs (or any end products of converting) for in executorch?

@cccclai
Copy link
Contributor

cccclai commented May 2, 2024

xnnpack (https://github.com/google/XNNPACK) is a software library with a list highly optimized operators in CPU. It can work on iOS too.

Regarding CoreML questions, I'd defer to @cymbalrush and @YifanShenSZ to answer.

@cccclai cccclai added the module: coreml Issues related to Apple's Core ML delegation label May 2, 2024
@SS-JIA
Copy link
Contributor

SS-JIA commented May 2, 2024

Will also cc: @shoumikhin for iOS/MacOS related inquiries.

@YifanShenSZ
Copy link
Collaborator

Hey @antmikinka, would this simpler export work for you?

python -m examples.models.llama2.export_llama --checkpoint /Users/anthonymikinka/executorch/llama-2-7b-chat/consolidated.00.pth --params /Users/anthonymikinka/executorch/llama-2-7b-chat/params.json -kv --coreml

Concretely, this is a good start point that we have tested and made sure working. For all other arguments, could you please try to add them one by one until issue pops up? (so we can have more clarity on what went wrong)

@antmikinka
Copy link
Author

@YifanShenSZ I kept running into Disk Memory Issues on my Macbook Pro Even upto 30GB free space.
I added the
--group_size 128 -qmode 8da4w -d fp32 --verbose --max_seq_length 512 -o "/Volumes/NVME 3/ExecuTorch Models"
I got the error above once again.

I started to work on the arguments.

  • I did -kv --coreml --group_size 128 -d fp32 --verbose --max_seq_length 512 -o "/Volumes/NVME 3/ExecuTorch Models"
    ran out of space once again.

  • I did -kv --coreml -qmode 8da4w -d fp32 --verbose -o "/Volumes/NVME 3/ExecuTorch Models"
    ran into the quantized issue. I am thinking that the -qmode 8da4w argument may be the issue.
    To possibly help narrow this down, I took the last couple hundred lines of my terminal and created a gist.
    PyTorch-executorch-issue 3443 terminal.txt

  • I did -kv --coreml -qmode 8da4w
    I made a log file for this one. Here is the gist executorch.log
    I got the quantized error as well. Looks like I was right about the -qmode 8da4w

@antmikinka
Copy link
Author

Just tried -kv --coreml --verbose --group_size 128 --max_seq_length 128
ran out of stroage, used 29gb trying to convert.

I may try again later today after trying to free up some more storage. let me know if that log file has helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: coreml Issues related to Apple's Core ML delegation
Projects
None yet
Development

No branches or pull requests

4 participants