Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot replicate DPO results of zephyr #124

Open
AlexiaJM opened this issue Feb 23, 2024 · 5 comments
Open

cannot replicate DPO results of zephyr #124

AlexiaJM opened this issue Feb 23, 2024 · 5 comments

Comments

@AlexiaJM
Copy link

I cannot replicate the DPO results for zephyr.

I use a modified version of config_full.yaml with the only difference being that I set gradient_accumulation_steps: 4 instead of 2 because I use 4 GPUs. I'm using all the correct versions of software as in setup.py. I resumed twice during training and its something that is inevitable with our cluster, but if resuming set seeds properly, this should not be a problem.

Code:
`ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes=4 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full4.yaml

The results is here: https://huggingface.co/AlexiaJM/zephyr-7b-dpo-full-repnew. As you can see the numbers are slightly off from https://huggingface.co/alignment-handbook/zephyr-7b-dpo-full but not significantly.

These are the results from the MT-Bench:

########## First turn ##########
score
model turn
zephyr-7b-dpo-full 1 7.81250
zephyr-7b-dpo-full-repnew 1 7.5375

########## Second turn ##########
score
model turn
zephyr-7b-dpo-full 2 7.322785
zephyr-7b-dpo-full-repnew 2 7.125

########## Average ##########
score
model
zephyr-7b-dpo-full 7.569182
zephyr-7b-dpo-full-repnew 7.33125

@AlexiaJM
Copy link
Author

Related to #45

@xijiu9
Copy link

xijiu9 commented Feb 24, 2024

I met similar question with you: My model gives

########## First turn ##########
score
model turn
zephyr-7b-dpo-full-self-ref 1 7.79375
zephyr-7b-dpo-full-self 1 7.43750
zephyr-7b-sft-full-self-ref 1 6.63125
zephyr-7b-sft-full-self 1 6.39375

########## Second turn ##########
score
model turn
zephyr-7b-dpo-full-self-ref 2 7.35000
zephyr-7b-dpo-full-self 2 6.69375
zephyr-7b-sft-full-self-ref 2 5.97500
zephyr-7b-sft-full-self 2 5.61250

########## Average ##########
score
model
zephyr-7b-dpo-full-self-ref 7.571875
zephyr-7b-dpo-full-self 7.065625
zephyr-7b-sft-full-self-ref 6.303125
zephyr-7b-sft-full-self 6.003125

@xijiu9
Copy link

xijiu9 commented Feb 24, 2024

where the models ends with '-ref' is the official checkpoint from huggingface, and models ends with '-self' are my models when reproducing the experiment.

@EriChen0615
Copy link

EriChen0615 commented Mar 4, 2024

Experiencing similar issues here. The replicated model scores about 0.3 lower than the published zephyr-7b-dpo-full.

Reported in the blog post.
1 Zephyr-7B-sft 6.24 from HF tutorial
2 Zephyr-7b-dpo-full 7.50 from HF tutorial

Using FastChat's inference script with empty system message
3 Zephyr-7B-sft 6.42
4 Zephyr-7b-dpo-full 7.48

Trained with the repo
5 Zephyr-7b-dpo-beta=0.01 7.16

In addition, the training statistics when training Zephyr-7B with beta=0.01 are very different from what's published. I checked against the published DPO training statistics (at epoch 0.84) of Zephyr-7b-dpo-full. Below I list the value in our training and in parenthesis I list the reported values.

  • Training Loss: 0.6008 (0.4853)
  • Validation Loss: 0.6014 (0.5050)
  • Rewards/accuracies: 0.3313 (0.7539)
  • Rewards/margins: 0.3653 (1.0156)

The diff in reward/accuracies looks alarming. Any idea what could be the cause @lewtun?

@AlexiaJM @xijiu9 let me know if you have any progress in replicating!

Best,
Eric

@gxxu-ml
Copy link

gxxu-ml commented Apr 21, 2024

reward accuracy of 0.33 doesn't seem reasonable at epoch 0.84.
While there's is still a rewards/margins of 0.36? even though model is way more likely to guess wrong than guess right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants