Skip to content

annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation

Notifications You must be signed in to change notification settings

clam004/minichatgpt

Repository files navigation

minichatgpt

Focused dissection of the implementation details of a small simplified self contained toy project demonstrating reinforcement learning from human feedback (RLHF) with special emphasis on connecting the equations describing proximal policy optimization to the lines of pytorch code that apply PPO to work with sequences, such as completing sentences so they end with a positive sentiment. We do this not by self-supervised or supervised learning, but rather, by generating text and learning from scores assigned to that text after it is generated, this is analogous to the way ChatGPT was trained using human scores of model generated answers to instructions.

Building Development Environments

python virtual environment for data science

you@you chat-api % python3 -m venv venv
you@you chat-api % source venv/bin/activate
(venv) you@you chat-api % pip install --upgrade pip
(venv) you@you chat-api % pip install -r requirements.txt

install package using a setup.py and pip

To install package for development, from inside the top-level or main minichatgpt directory (the one where if you ls you see setup.py, requirements.txt and README.md in the same folder as you) run the below at the command line or terminal:

pip install -e .

leave out the -e for production pip install ., for other development packages like jupyter notebook and matplotlib, run:

pip install -e ".[interactive]"

you should see something like

Obtaining file:///Users/.../minichatgpt
  Preparing metadata (setup.py) ... done
Installing collected packages: minichatgpt
  Running setup.py develop for minichatgpt
Successfully installed minichatgpt-0....

Now from directories other than the top-level or main minichatgpt directory you can

import minichatgpt
from minichatgpt.example_script import example_class_function

and the changes you make to example_class_function will be available to you with your next import minichatgpt, no pip install -e . required

Tutorial

References and Credits

@misc{vonwerra2022trl, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert}, title = {TRL: Transformer Reinforcement Learning}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/lvwerra/trl}} }

About

annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published