[Bug] mpt-30b-chat answering questions based on langchain does not work #269

gzusgw · 2023-08-30T09:21:25Z

Is this a new bug?

I believe this is a new bug
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

run the code examples/learn/generation/llm-field-guide/mpt/mpt-30b-chatbot.ipynb :
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])
it only returns:
Explain to me the difference between nuclear fission and fusion.
without model's answer

Expected Behavior

The answer to the model can be returned

Steps To Reproduce

import torch
import transformers
from transformers import StoppingCriteria, StoppingCriteriaList
from torch import cuda, bfloat16

device = f'cuda:0' if cuda.is_available() else 'cpu'

model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-30b-chat',
trust_remote_code=True,
load_in_8bit=True, # this requires the bitsandbytes library
max_seq_len=8192,
init_device=device,
device_map="auto"
)
model.eval()
#model.to(device)
print(f"Model loaded on {device}")

tokenizer = transformers.AutoTokenizer.from_pretrained("mosaicml/mpt-30b-chat")

stop_token_ids = [
tokenizer.convert_tokens_to_ids(x) for x in [
['Human', ':'], ['AI', ':']
]
]

#define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
for stop_ids in stop_token_ids:
if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
return True
return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]

generate_text = transformers.pipeline(
model=model,
tokenizer=tokenizer,
return_full_text=True, # langchain expects the full text
task='text-generation',
# we pass model parameters here too
stopping_criteria=stopping_criteria, # without this model rambles during chat
temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max
top_p=0.15, # select from top tokens whose probability add up to 15%
top_k=0, # select from top 0 tokens (because zero, relies on top_p)
max_new_tokens=128, # mex number of tokens to generate in the output
repetition_penalty=1.1 # without this output begins repeating
)

res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])

Relevant log output

[2023-08-30 17:16:12,664] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-08-30 17:16:13.203919: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Instantiating an MPTForCausalLM model from /root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/54f33278a04aa4e612bca482b82f801ab658e890/modeling_mpt.py
You are using config.init_device='cuda:0', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|█████████████████████████████████████| 7/7 [01:07<00:00,  9.62s/it]
Model loaded on cuda:0
The model 'MPTForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Explain to me the difference between nuclear fission and fusion.

Environment

- **OS**: ubuntu20.04
- **Language version**:  Python 3.8.16
- **Pinecone client version**:  not use

Additional Context

No response

The text was updated successfully, but these errors were encountered:

gzusgw added the bug Something isn't working label Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] mpt-30b-chat answering questions based on langchain does not work #269

[Bug] mpt-30b-chat answering questions based on langchain does not work #269

gzusgw commented Aug 30, 2023 •

edited

[Bug] mpt-30b-chat answering questions based on langchain does not work #269

[Bug] mpt-30b-chat answering questions based on langchain does not work #269

Comments

gzusgw commented Aug 30, 2023 • edited

Is this a new bug?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

gzusgw commented Aug 30, 2023 •

edited