Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got error when exporting model with quantization #3516

Open
1 task done
dickens88 opened this issue Apr 29, 2024 · 0 comments
Open
1 task done

Got error when exporting model with quantization #3516

dickens88 opened this issue Apr 29, 2024 · 0 comments
Labels
pending This problem is yet to be addressed.

Comments

@dickens88
Copy link

dickens88 commented Apr 29, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

Dears,

I'm using the latest code from master and i deploy my env with docker-compose. When i try to export model with quantization, the backend gives errors like this

llama_factory  | importlib.metadata.PackageNotFoundError: No package metadata was found for The 'optimum>=1.16.0' distribution was not found and is required by this application. 
llama_factory  | To fix: pip install optimum>=1.16.0

after i installed all the necessary packages with this command:

pip install auto_gptq
pip install optimum
pip install -U accelerate bitsandbytes datasets peft transformers

I got another error:

llama_factory  | Traceback (most recent call last):
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 566, in process_events
llama_factory  |     response = await route_utils.call_process_api(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 261, in call_process_api
llama_factory  |     output = await app.get_blocks().process_api(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1788, in process_api
llama_factory  |     result = await self.call_function(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1352, in call_function
llama_factory  |     prediction = await utils.async_iteration(iterator)
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 583, in async_iteration
llama_factory  |     return await iterator.__anext__()
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 576, in __anext__
llama_factory  |     return await anyio.to_thread.run_sync(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
llama_factory  |     return await get_async_backend().run_sync_in_worker_thread(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
llama_factory  |     return await future
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 851, in run
llama_factory  |     result = context.run(func, *args)
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 559, in run_sync_iterator_async
llama_factory  |     return next(iterator)
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 742, in gen_wrapper
llama_factory  |     response = next(iterator)
llama_factory  |   File "/app/src/llmtuner/webui/components/export.py", line 79, in save_model
llama_factory  |     export_model(args)
llama_factory  |   File "/app/src/llmtuner/train/tuner.py", line 57, in export_model
llama_factory  |     model = load_model(tokenizer, model_args, finetuning_args)  # must after fixing tokenizer to resize vocab
llama_factory  |   File "/app/src/llmtuner/model/loader.py", line 113, in load_model
llama_factory  |     model = AutoModelForCausalLM.from_pretrained(**init_kwargs)
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
llama_factory  |     return model_class.from_pretrained(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3165, in from_pretrained
llama_factory  |     hf_quantizer.validate_environment(
llama_factory  |   File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_gptq.py", line 56, in validate_environment
llama_factory  |     raise ImportError(
llama_factory  | ImportError: Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq library (`pip install auto-gptq`)

my settings is like following:
image

Expected behavior

No response

System Info

No response

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed. label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed.
Projects
None yet
Development

No branches or pull requests

2 participants