[BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks." #630

tattrongvu · 2024-04-03T13:39:46Z

Describe the bug
Hi all, recently when I do experiments with different quantization method, I come through TheBloke repo in Huggingface but can't find my model on his quantized list. That why I decide to quantize myself.
I also looked at #291.

I was use GPTQ to quantize Llama-2-70B model with following setup:

Hardware details
4 x A100 (80GB each) = 320GB VRAM
32 core of CPU.
240 GB RAM

Software version
Cuda 12.2

pip install auto-gptq==0.7.1 --no-build-isolation

pip install transformers==4.38.2

To Reproduce
Steps to reproduce the behavior:
Use following config:

quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    group_size=32,  # 64, 128
    damp_percent=0.01,
    desc_act=True,  # set to False can significantly speed up inference but the perplexity may slightly bad
)

max_memory = {0: "40GIB", 1: "40GIB", 2: "40GIB", 3: "40GIB"}

num_samples = 2048
pretrained_model_dir = "/models/llama70b"
quantized_model_dir = "/models/gptq/llama70b_4bit_32_act"
quantize_dataset_path = "/Dataset/alpaca_data_cleaned.json"

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, 
                                          #use_fast=True,
                                          trust_remote_code=True
                                         )

And then:

model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, 
                                            quantize_config,
                                            trust_remote_code=True,
                                            torch_dtype=torch.float16,
                                            max_memory=max_memory,
                                           )

model.quantize(examples_for_quant, 
               cache_examples_on_gpu=False,
               batch_size=8
              )

The error happen when

model.save_quantized(quantized_model_dir)

When I monitoring the process with top and nvidia-smi command in Linux:

In the quantization phase, every thing seem normal:
In the beginning, because Llama-2-70b is 140GB in fp16, therefore 4 gpu each recieved 35GB. It seem correct and normal.
Then, each GPU is used consequently depend on which layer is in it. e.g: When it come to the layer in the particular GPU, the usage of this GPU will be high.
This take 4 hours long.
But after it finish with layer 80. I see that it copy all of the weight down to the CPU, I see the RAM usage is increasing slowly to 220GB RAM. It take about 1 hours just for copy from GPU to CPU.
In the end my process is killed with the reason:

...
INFO - Quantizing mlp.down_proj in layer 80/80...
You shouldn't move a model that is dispatched using accelerate hooks.
Killed

Lastly I would like to know the differences with the loading method of autoawq.
For reference, when I quantized with AutoAWQ, it take about 1 hours and its able to run on a instance with 90RAM + 80GB VRAM.
With GPTQ it is much slower (5 hours compare to 1 hours) and 4 time resource requirement VRAM or RAM.

Now Im trying to use the low_cpu_mem_usage=True and ignore the max_memory as @TheBloke mentioned OutOfMemoryError: CUDA out of memory. #291 (comment)

The text was updated successfully, but these errors were encountered:

murtaza-nasir · 2024-05-13T07:11:19Z

Were you able to fix this? I am getting the same error.

I am using the example script quant_with_alpaca.py, and it looks like the model gets quantized. It even tests the quantized model for inference. However, the model never gets saved and I see this line in the output: "You shouldn't move a model that is dispatched using accelerate hooks."

tattrongvu added the bug Something isn't working label Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks." #630

[BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks." #630

tattrongvu commented Apr 3, 2024 •

edited

murtaza-nasir commented May 13, 2024

[BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks." #630

[BUG] Can not save quantized model to disk: "you shouldn't move a model that is dispatched using accelerate hooks." #630

Comments

tattrongvu commented Apr 3, 2024 • edited

murtaza-nasir commented May 13, 2024

tattrongvu commented Apr 3, 2024 •

edited