Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks." #630

Open
tattrongvu opened this issue Apr 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@tattrongvu
Copy link

tattrongvu commented Apr 3, 2024

Describe the bug
Hi all, recently when I do experiments with different quantization method, I come through TheBloke repo in Huggingface but can't find my model on his quantized list. That why I decide to quantize myself.
I also looked at #291.

I was use GPTQ to quantize Llama-2-70B model with following setup:

Hardware details
4 x A100 (80GB each) = 320GB VRAM
32 core of CPU.
240 GB RAM

Software version
Cuda 12.2

pip install auto-gptq==0.7.1 --no-build-isolation
pip install transformers==4.38.2

To Reproduce
Steps to reproduce the behavior:
Use following config:

quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    group_size=32,  # 64, 128
    damp_percent=0.01,
    desc_act=True,  # set to False can significantly speed up inference but the perplexity may slightly bad
)
max_memory = {0: "40GIB", 1: "40GIB", 2: "40GIB", 3: "40GIB"}
num_samples = 2048
pretrained_model_dir = "/models/llama70b"
quantized_model_dir = "/models/gptq/llama70b_4bit_32_act"
quantize_dataset_path = "/Dataset/alpaca_data_cleaned.json"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, 
                                          #use_fast=True,
                                          trust_remote_code=True
                                         )

And then:

model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, 
                                            quantize_config,
                                            trust_remote_code=True,
                                            torch_dtype=torch.float16,
                                            max_memory=max_memory,
                                           )

model.quantize(examples_for_quant, 
               cache_examples_on_gpu=False,
               batch_size=8
              )

The error happen when

model.save_quantized(quantized_model_dir)

When I monitoring the process with top and nvidia-smi command in Linux:

  1. In the quantization phase, every thing seem normal:
    In the beginning, because Llama-2-70b is 140GB in fp16, therefore 4 gpu each recieved 35GB. It seem correct and normal.
    Then, each GPU is used consequently depend on which layer is in it. e.g: When it come to the layer in the particular GPU, the usage of this GPU will be high.
    This take 4 hours long.

  2. But after it finish with layer 80. I see that it copy all of the weight down to the CPU, I see the RAM usage is increasing slowly to 220GB RAM. It take about 1 hours just for copy from GPU to CPU.

  3. In the end my process is killed with the reason:

...
INFO - Quantizing mlp.down_proj in layer 80/80...
You shouldn't move a model that is dispatched using accelerate hooks.
Killed

Lastly I would like to know the differences with the loading method of autoawq.
For reference, when I quantized with AutoAWQ, it take about 1 hours and its able to run on a instance with 90RAM + 80GB VRAM.
With GPTQ it is much slower (5 hours compare to 1 hours) and 4 time resource requirement VRAM or RAM.

@tattrongvu tattrongvu added the bug Something isn't working label Apr 3, 2024
@murtaza-nasir
Copy link

Were you able to fix this? I am getting the same error.

I am using the example script quant_with_alpaca.py, and it looks like the model gets quantized. It even tests the quantized model for inference. However, the model never gets saved and I see this line in the output: "You shouldn't move a model that is dispatched using accelerate hooks."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants