[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

danielstankw · 2024-05-10T15:29:29Z

Your current environment

...

How would you like to use vllm

I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ.
Whenever I run the script below I get 0% GPU utilization.
Can anyone assist why can this be happening?

import json
from huggingface_hub import snapshot_download
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import os

# some other code here
# ////////////////
# some code here

# Load model
model = AutoAWQForCausalLM.from_pretrained(args.model_path, device_map="auto", **{"low_cpu_mem_usage": True})
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)

# Load quantization config from file
if args.quant_config:
    quant_config = json.loads(args.config)
else:
    # Default quantization config
    print("Using default quantization config")
    quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}

# Quantize
print("Quantizing the model")
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model and tokenizer
if args.quant_path:
    print("Saving the model")
    model.save_quantized(args.quant_path)
    tokenizer.save_pretrained(args.quant_path)
else:
    print("No quantized model path provided, not saving quantized model.")

danielstankw added the usage How to use vllm label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

danielstankw commented May 10, 2024

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

Comments

danielstankw commented May 10, 2024

Your current environment

How would you like to use vllm