The absence or presence of a system token results in different outputs. #203

Sneakr · 2024-05-10T12:44:11Z

Describe the bug

As per the official documentation:
https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

It is stated:

A prompt should contain a single system message, can contain multiple alternating user and assistant messages, and always ends with the last user message followed by the assistant header.

However, in follow-up examples given in the documentation, system token is only present if the system message is present:

1: Single message example

<|begin_of_text|>1<|start_header_id|>user<|end_header_id|>2
{{ user_message }}3<|eot_id|>4<|start_header_id|>assistant<|end_header_id|>5

2: System prompt message added to a single user message

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.

This can be seen here in my findings:
ggerganov/llama.cpp#7062 (comment)

Fine tuning instruct model:
Fine tuning the instruct models with system token present, and then run inference without system tokens present, breaks the fine tuning.

Inference on original instruct model:
Since the outputs are different based on the presence of system tokens, the question arrives, is the output better or worse for the instruct models? Which method produces the expected output based on the instruct tuning that has been done internally by Meta?

The text was updated successfully, but these errors were encountered:

Sneakr · 2024-05-11T20:38:05Z

So, did META just change the model card page after my github issue, completely ignoring this issue? :)

https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

subramen · 2024-05-14T15:25:21Z

However, having no system message string present but still include the system token, results in a completely different output compared to having no system token at all.

Are you referring to a case where you pass the system header but no system_prompt, i.e.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|>

Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string. If you don't have a system message it is better to not include the system header. This is how we encode dialogs

llama3/llama/tokenizer.py

Line 202 in cc44ca2

class ChatFormat:

I don't think the changes to the model-card are related to this issue, but we'd appreciate your suggestions to improve its clarity :) cc @carljparker

Sneakr · 2024-05-14T22:37:13Z

@subramen

Thanks for your response. Yes, that's what I'm referring to.

Getting a different output is expected behavior because the template is sensitive to the header; the model is expecting a system message but it is getting an empty string.

It is indeed expected behavior, as the input becomes is different, the output would be different. However the question is which output is the expected one by the author of the model and the training process.

As per my findings, If the model has been trained with system headers present (in my case fine tuned):

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

And later inferenced as per the tokenizer.py you referenced

Conclusion:
It produces a different output which breaks the behaviour of the training progress and the training data - if the system headers are not present as they were during the training process.

If you don't have a system message it is better to not include the system header. This is how we encode dialogs

1: Why would it not be included if it was trained with a system header? Wouldn't it be logical to assume that your outputs during training is the one we should expect during inference, and therefore keep the system headers as is regardless of an empty system message or not?

2: What makes you conclude that it is better to leave out the system message? We have 2 different outputs, how do we come to that conclusion that one output (without system headers) would be better than the other (with system headers)?

In my tests, the opposite is true, especially during tuning and training, leaving out tokens that were present during training would break the expected output.

I'm grateful for clarification and your response! :)

In regards to the model card page, it is something only one can speculate and only the author of the page knows the reason for the changes, it is peculiar however that my quoted wordings were completely removed just a day after my issue here. But no clarification shined on this thread. But let's leave that aside and focus on the issue at hand.

subramen · 2024-05-15T16:50:51Z

My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
<|eot_id|><|start_header_id|>user<|end_header_id|>
{user_msg}<eot_id>

So i would not expect it to give good results. If you are getting better results with a null prompt, that's interesting - if you can share it, please DM me on twitter (same handle as github username).

Sneakr · 2024-05-15T17:22:07Z

My response is based on the assumption that the model was NOT finetuned with a system header & null system prompt ie.

No no , you are correct, the better result is if it was trained with system headers and later inferenced with the system headers present too , regardless of null system message.

The second question I mean and the question is for the official Meta instruct model:
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

Should the system headers be present or not, regardless of null system prompt?

Sneakr · 2024-05-21T17:07:03Z

Just leaving this in here https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/sample_finetune.py

def apply_chat_template(
    example,
    tokenizer,
):
    messages = example["messages"]
    **# Add an empty system message if there is none**
    if messages[0]["role"] != "system":
        messages.insert(0, {"role": "system", "content": ""})
    example["text"] = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False)
    return example

Edit:
https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/commit/bbd531db4632bb631b0c44d98172894a0c594dd0
After lifting a different issue with PHI missing the system tokens in the tokenizer config they removed the system tokens in the fine tuning script due to not being supported by the model. However, this is not the case for Llama3 instruct, as the system token seems to be supported by the model.

Sneakr mentioned this issue May 10, 2024

llama3 instruct template will have different outputs depending on system tokens ollama/ollama#4312

Open

Sneakr mentioned this issue May 12, 2024

Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly ggerganov/llama.cpp#7062

Closed

subramen added the bug Something isn't working label May 14, 2024

subramen closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The absence or presence of a system token results in different outputs. #203

The absence or presence of a system token results in different outputs. #203

Sneakr commented May 10, 2024

Sneakr commented May 11, 2024

subramen commented May 14, 2024 •

edited

Sneakr commented May 14, 2024

subramen commented May 15, 2024

Sneakr commented May 15, 2024 •

edited

Sneakr commented May 21, 2024 •

edited

The absence or presence of a system token results in different outputs. #203

The absence or presence of a system token results in different outputs. #203

Comments

Sneakr commented May 10, 2024

Describe the bug

Sneakr commented May 11, 2024

subramen commented May 14, 2024 • edited

Sneakr commented May 14, 2024

subramen commented May 15, 2024

Sneakr commented May 15, 2024 • edited

Sneakr commented May 21, 2024 • edited

subramen commented May 14, 2024 •

edited

Sneakr commented May 15, 2024 •

edited

Sneakr commented May 21, 2024 •

edited