Does airllm support quantized gguf/gptq/awq models ? #133

robik72 · 2024-04-28T22:03:04Z

I am trying to use air llm on my pc (win11, 32gb ram, rtx 3080 with 10gb vram) to run llama 3 70b.
After downloading llama 3 quantized at 4bit from here: I have tried to load the model with the provided sample code, including compression:

model = AutoModel.from_pretrained(r"(MY WINDOWS PATH)\Meta-Llama-3-70B-Instruct-GGUF\Meta-Llama-3-70B-Instruct-Q4_K_M.gguf", compression='4bit' )

It quickly allocated all my memory until the computer was completely unresponsive and i had to use the hard reset.
So does air llm support quantized models in the format GGUF/GPTQ/AWQ ? or works only with the original, not quantized, model ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does airllm support quantized gguf/gptq/awq models ? #133

Does airllm support quantized gguf/gptq/awq models ? #133

robik72 commented Apr 28, 2024

Does airllm support quantized gguf/gptq/awq models ? #133

Does airllm support quantized gguf/gptq/awq models ? #133

Comments

robik72 commented Apr 28, 2024