You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use air llm on my pc (win11, 32gb ram, rtx 3080 with 10gb vram) to run llama 3 70b.
After downloading llama 3 quantized at 4bit from here: I have tried to load the model with the provided sample code, including compression:
model = AutoModel.from_pretrained(r"(MY WINDOWS PATH)\Meta-Llama-3-70B-Instruct-GGUF\Meta-Llama-3-70B-Instruct-Q4_K_M.gguf", compression='4bit' )
It quickly allocated all my memory until the computer was completely unresponsive and i had to use the hard reset.
So does air llm support quantized models in the format GGUF/GPTQ/AWQ ? or works only with the original, not quantized, model ?
The text was updated successfully, but these errors were encountered:
I am trying to use air llm on my pc (win11, 32gb ram, rtx 3080 with 10gb vram) to run llama 3 70b.
After downloading llama 3 quantized at 4bit from here: I have tried to load the model with the provided sample code, including compression:
model = AutoModel.from_pretrained(r"(MY WINDOWS PATH)\Meta-Llama-3-70B-Instruct-GGUF\Meta-Llama-3-70B-Instruct-Q4_K_M.gguf", compression='4bit' )
It quickly allocated all my memory until the computer was completely unresponsive and i had to use the hard reset.
So does air llm support quantized models in the format GGUF/GPTQ/AWQ ? or works only with the original, not quantized, model ?
The text was updated successfully, but these errors were encountered: