Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The absence or presence of a system token results in different outputs. #203

Closed
Sneakr opened this issue May 10, 2024 · 6 comments
Closed
Labels
bug Something isn't working

Comments

@Sneakr
Copy link

Sneakr commented May 10, 2024

Describe the bug

As per the official documentation:
https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

It is stated:

A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followed by the assistant header.

However, in follow-up examples given in the documentation, system token is only present if the system message is present:

1: Single message example

<|begin_of_text|>1<|start_header_id|>user<|end_header_id|>2
{{ user_message }}3<|eot_id|>4<|start_header_id|>assistant<|end_header_id|>5

2: System prompt message added to a single user message

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.

This can be seen here in my findings:
ggerganov/llama.cpp#7062 (comment)

Fine tuning instruct model:
Fine tuning the instruct models with system token present, and then run inference without system tokens present, breaks the fine tuning.

Inference on original instruct model:
Since the outputs are different based on the presence of system tokens, the question arrives, is the output better or worse for the instruct models? Which method produces the expected output based on the instruct tuning that has been done internally by Meta?

@Sneakr
Copy link
Author

Sneakr commented May 11, 2024

So, did META just change the model card page after my github issue, completely ignoring this issue? :)

https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

@subramen
Copy link
Contributor

subramen commented May 14, 2024

However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.

Are you referring to a case where you pass the system header but no system_prompt, i.e.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|>

Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string. If you don't have a system message it is better to not include the system header. This is how we encode dialogs

class ChatFormat:

I don't think the changes to the model-card are related to this issue, but we'd appreciate your suggestions to improve its clarity :) cc @carljparker

@subramen subramen added the bug Something isn't working label May 14, 2024
@Sneakr
Copy link
Author

Sneakr commented May 14, 2024

@subramen

Thanks for your response. Yes, that's what I'm referring to.

Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string.

It is indeed expected behavior, as the input becomes is different, the output would be different. However the question is which output is the expected one by the author of the model and the training process.

As per my findings, If the model has been trained with system headers present (in my case fine tuned):

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

And later inferenced as per the tokenizer.py you referenced

Conclusion:
It produces a different output which breaks the behaviour of the training progress and the training data - if the system headers are not present as they were during the training process.

If you don't have a system message it is better to not include the system header. This is how we encode dialogs

1: Why would it not be included if it was trained with a system header? Wouldn't it be logical to assume that your outputs during training is the one we should expect during inference, and therefore keep the system headers as is regardless of an empty system message or not?

2: What makes you conclude that it is better to leave out the system message? We have 2 different outputs, how do we come to that conclusion that one output (without system headers) would be better than the other (with system headers)?

In my tests, the opposite is true, especially during tuning and training, leaving out tokens that were present during training would break the expected output.

I'm grateful for clarification and your response! :)

In regards to the model card page, it is something only one can speculate and only the author of the page knows the reason for the changes, it is peculiar however that my quoted wordings were completely removed just a day after my issue here. But no clarification shined on this thread. But let's leave that aside and focus on the issue at hand.

@subramen
Copy link
Contributor

My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_msg}<eot_id>

So i would not expect it to give good results. If you are getting better results with a null prompt, that's interesting - if you can share it, please DM me on twitter (same handle as github username).

@Sneakr
Copy link
Author

Sneakr commented May 15, 2024

My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.

No no , you are correct, the better result is if it was trained with system headers and later inferenced with the system headers present too , regardless of null system message.

The second question I mean and the question is for the official Meta instruct model:
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

Should the system headers be present or not, regardless of null system prompt?

@Sneakr
Copy link
Author

Sneakr commented May 21, 2024

Just leaving this in here https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/sample_finetune.py

def apply_chat_template(
    example,
    tokenizer,
):
    messages = example["messages"]
    **# Add an empty system message if there is none**
    if messages[0]["role"] != "system":
        messages.insert(0, {"role": "system", "content": ""})
    example["text"] = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False)
    return example

Edit:
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/commit/bbd531db4632bb631b0c44d98172894a0c594dd0
After lifting a different issue with PHI missing the system tokens in the tokenizer config they removed the system tokens in the fine tuning script due to not being supported by the model. However, this is not the case for Llama3 instruct, as the system token seems to be supported by the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants