Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

Open
danielstankw opened this issue May 10, 2024 · 0 comments
Open

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU #4744

danielstankw opened this issue May 10, 2024 · 0 comments
Labels
usage How to use vllm

Comments

@danielstankw
Copy link

Your current environment

...

How would you like to use vllm

I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ.
Whenever I run the script below I get 0% GPU utilization.
Can anyone assist why can this be happening?

import json
from huggingface_hub import snapshot_download
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import os

# some other code here
# ////////////////
# some code here

# Load model
model = AutoAWQForCausalLM.from_pretrained(args.model_path, device_map="auto", **{"low_cpu_mem_usage": True})
tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)

# Load quantization config from file
if args.quant_config:
    quant_config = json.loads(args.config)
else:
    # Default quantization config
    print("Using default quantization config")
    quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}

# Quantize
print("Quantizing the model")
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model and tokenizer
if args.quant_path:
    print("Saving the model")
    model.save_quantized(args.quant_path)
    tokenizer.save_pretrained(args.quant_path)
else:
    print("No quantized model path provided, not saving quantized model.")

image
image

@danielstankw danielstankw added the usage How to use vllm label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant