Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

调整DPO #1042

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

调整DPO #1042

wants to merge 6 commits into from

Conversation

XXXXRT666
Copy link
Contributor

@XXXXRT666 XXXXRT666 commented May 1, 2024

torch.randint(0, 1, size=(1, ))[0]
改proportion可以控制比例

fixed random error on DPO
@XXXXRT666 XXXXRT666 changed the title 修了DPO没有吞字惩罚的bug 修了DPO也许会加重吞字的bug May 1, 2024
make it half half
@XXXXRT666 XXXXRT666 changed the title 修了DPO也许会加重吞字的bug 修了DPO的bug May 2, 2024
now we can control the proportion
@SapphireLab
Copy link
Contributor

虽然似乎合理了,但是 #950 ?

@XXXXRT666 XXXXRT666 changed the title 修了DPO的bug 调整DPO May 2, 2024
@XXXXRT666
Copy link
Contributor Author

虽然似乎合理了,但是 #950 ?

先改成repeat占比0.9了

@XXXXRT666
Copy link
Contributor Author

XXXXRT666 commented May 2, 2024

虽然似乎合理了,但是 #950 ?

他说的更像是取[0]的原因,还需要进一步实验

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants