-
描述该功能请问是否会开源PPO的训练code和reward model? 是否希望自己实现该功能?
|
Beta Was this translation helpful? Give feedback.
Answered by
hellock
Mar 8, 2024
Replies: 1 comment
Answer selected by
ZwwWayne
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
在Q2的计划中