R³: Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

Implementation of the "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" presented by Zhiheng Xi, Wenxiang Chen, Boyang Hong, et al.

Paper Link: https://arxiv.org/abs/2402.05808

💡 Introduction

🛠️ Set up

It is suggested to use a python 3.9 environment to run the experiment. Run the following commands to set up your environment:

git clone https://github.com/xxxxx.git

conda create -n R3_math python=3.9 -y
cd R3_math/
pip install -r requirements.txt

conda create -n R3_others python=3.9 -y
cd R3_others/
pip install -r requirements.txt

⚡️Usage

Step1: SFT Training

To train a sft model, first set the model path and output path in the R3_others/scripts/step1_supervised_finetuning/R3_sft.shscript. Then, run the following command:

cd R3_others/scripts/step1_supervised_finetuning/
bash R3_sft.sh

Step2: R³ Training

To train a reinforced model using R$^3$ on GSM8K (or other math datasets), first set the actor model path (it should be a sft model checkpoint from Step1) and output path in R3_math/scripts/R3_cot_gsm8k.sh, and run the following command:

cd R3_math/scripts/
bash R3_cot_gsm8k.sh

Note: If you want to try R$^3$ on other datasets like MNLI or race@High, set the SFT model path in R3_others/scripts/step3_rlhf_finetuning/R3_mix.sh. Then, run the folloing command:

cd R3_others/scripts/step3_rlhf_finetuning/
bash R3_mix.sh

Evaluation

It is not required for math datasets. Results will be saved in wandb.

To evaluate the model performance, first run the evaluation script R3_others/scripts/eval/eval_single.sh. Then, get your results in output_{dataset_name}.py. Here's an example for MNLI dataset:

cd R3_others/scripts/eval
bash eval_single.sh
# after evaluation
# you will get a result file like: eval_mnli/R3_test.txt

python output_mnli.py
# then you will get acc result

Data

For the purpose of security review, we provide some examples of the data, formatted as follows:

Dataset: MNLI
	---- mnli_train_example.json # for SFT
	---- mnli_mix_example.json # fot R^3
	---- mnli_test.json

✏️ Citation

If you find R$^3$ useful for your your research and applications, please cite using this BibTeX:

@misc{xi2024training,
      title={Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning}, 
      author={Zhiheng Xi and Wenxiang Chen and Boyang Hong and Senjie Jin and Rui Zheng and Wei He and Yiwen Ding and Shichun Liu and Xin Guo and Junzhe Wang and Honglin Guo and Wei Shen and Xiaoran Fan and Yuhao Zhou and Shihan Dou and Xiao Wang and Xinbo Zhang and Peng Sun and Tao Gui and Qi Zhang and Xuanjing Huang},
      year={2024},
      eprint={2402.05808},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Contact

zhxi22@m.fudan.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
R3_math		R3_math
R3_others		R3_others
src/figures		src/figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R3_math

R3_math

R3_others

R3_others

src/figures

src/figures

README.md

README.md

Repository files navigation

R³: Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

💡 Introduction

🛠️ Set up

⚡️Usage

Step1: SFT Training

Step2: R³ Training

Evaluation

Data

✏️ Citation

Contact

About

Releases

Packages

Languages

WooooDyy/LLM-Reverse-Curriculum-RL

Folders and files

Latest commit

History

Repository files navigation

R3: Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

💡 Introduction

🛠️ Set up

⚡️Usage

Step1: SFT Training

Step2: R3 Training

Evaluation

Data

✏️ Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages

R³: Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

Step2: R³ Training