You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ.
Whenever I run the script below I get 0% GPU utilization.
Can anyone assist why can this be happening?
importjsonfromhuggingface_hubimportsnapshot_downloadfromawqimportAutoAWQForCausalLMfromtransformersimportAutoTokenizerimportos# some other code here# ////////////////# some code here# Load modelmodel=AutoAWQForCausalLM.from_pretrained(args.model_path, device_map="auto", **{"low_cpu_mem_usage": True})
tokenizer=AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True)
# Load quantization config from fileifargs.quant_config:
quant_config=json.loads(args.config)
else:
# Default quantization configprint("Using default quantization config")
quant_config= {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}
# Quantizeprint("Quantizing the model")
model.quantize(tokenizer, quant_config=quant_config)
# Save quantized model and tokenizerifargs.quant_path:
print("Saving the model")
model.save_quantized(args.quant_path)
tokenizer.save_pretrained(args.quant_path)
else:
print("No quantized model path provided, not saving quantized model.")
The text was updated successfully, but these errors were encountered:
Your current environment
...
How would you like to use vllm
I have downloaded a model. Now on my 4 GPU instance I attempt to quantize it using AutoAWQ.
Whenever I run the script below I get 0% GPU utilization.
Can anyone assist why can this be happening?
The text was updated successfully, but these errors were encountered: