Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] mpt-30b-chat answering questions based on langchain does not work #269

Open
2 tasks done
gzusgw opened this issue Aug 30, 2023 · 0 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@gzusgw
Copy link

gzusgw commented Aug 30, 2023

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

run the code examples/learn/generation/llm-field-guide/mpt/mpt-30b-chatbot.ipynb :
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])
it only returns:
Explain to me the difference between nuclear fission and fusion.
without model's answer

Expected Behavior

The answer to the model can be returned

Steps To Reproduce

import torch
import transformers
from transformers import StoppingCriteria, StoppingCriteriaList
from torch import cuda, bfloat16

device = f'cuda:0' if cuda.is_available() else 'cpu'

model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-30b-chat',
trust_remote_code=True,
load_in_8bit=True, # this requires the bitsandbytes library
max_seq_len=8192,
init_device=device,
device_map="auto"
)
model.eval()
#model.to(device)
print(f"Model loaded on {device}")

tokenizer = transformers.AutoTokenizer.from_pretrained("mosaicml/mpt-30b-chat")

stop_token_ids = [
tokenizer.convert_tokens_to_ids(x) for x in [
['Human', ':'], ['AI', ':']
]
]

#define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
for stop_ids in stop_token_ids:
if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
return True
return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]

generate_text = transformers.pipeline(
model=model,
tokenizer=tokenizer,
return_full_text=True, # langchain expects the full text
task='text-generation',
# we pass model parameters here too
stopping_criteria=stopping_criteria, # without this model rambles during chat
temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max
top_p=0.15, # select from top tokens whose probability add up to 15%
top_k=0, # select from top 0 tokens (because zero, relies on top_p)
max_new_tokens=128, # mex number of tokens to generate in the output
repetition_penalty=1.1 # without this output begins repeating
)

res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])

Relevant log output

[2023-08-30 17:16:12,664] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-08-30 17:16:13.203919: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Instantiating an MPTForCausalLM model from /root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/54f33278a04aa4e612bca482b82f801ab658e890/modeling_mpt.py
You are using config.init_device='cuda:0', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|█████████████████████████████████████| 7/7 [01:07<00:00,  9.62s/it]
Model loaded on cuda:0
The model 'MPTForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Explain to me the difference between nuclear fission and fusion.

Environment

- **OS**: ubuntu20.04
- **Language version**:  Python 3.8.16
- **Pinecone client version**:  not use

Additional Context

No response

@gzusgw gzusgw added the bug Something isn't working label Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant