Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when quantizing mixtral 8x7b model. "ZeroDivisionError: float division by zero " #634

Open
arceus-jia opened this issue Apr 7, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@arceus-jia
Copy link

I am getting "float division by zero" error whenever I try to quantize mixtral related models with autogptq,
and here is my code.

from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import logging

logging.basicConfig(
    format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)

# pretrained_model_dir = "mistralai/Mistral-7B-Instruct-v0.2"
# quantized_model_dir = "Mistral-7B-Instruct-v0.2-gptq"

pretrained_model_dir = "hfl/Chinese-Mixtral-Instruct"
quantized_model_dir = "Chinese-Mixtral-Instruct-gptq"

# pretrained_model_dir = 'cognitivecomputations/dolphin-2.6-mixtral-8x7b'
# quantized_model_dir = "dolphin-2.6-mixtral-8x7b-gptq"


tokenizer = AutoTokenizer.from_pretrained(pretrained_model_dir, use_fast=True)
examples = [
    tokenizer(
        "auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
    )
]

quantize_config = BaseQuantizeConfig(
    bits=4,  # quantize model to 4-bit
    # group_size=128,  # it is recommended to set the value to 128
    desc_act=True,  # set to False can significantly speed up inference but the perplexity may slightly bad
    damp_percent=0.1
)

# load un-quantized model, by default, the model will always be loaded into CPU memory
model = AutoGPTQForCausalLM.from_pretrained(pretrained_model_dir, quantize_config)

# quantize model, the examples should be list of dict whose keys can only be "input_ids" and "attention_mask"
model.quantize(examples)

# save quantized model
model.save_quantized(quantized_model_dir)

It looks like nsmaples becomes 0, but I don't know how it happens, has anyone had a similar problem?
F4F03316-65B3-4BD2-9E33-36C8601B73AB

@arceus-jia arceus-jia added the bug Something isn't working label Apr 7, 2024
@Qubitium
Copy link
Contributor

@arceus-jia Try 1) use 4.39.3 (latest) transformer or 4.38.2 (stable) 2) install autogtq from repo using git clone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants