An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
-
Updated
Jun 9, 2024 - Python
An Easy-to-use, Scalable and High-performance RLHF Framework (Support 70B+ full tuning & LoRA & Mixtral & KTO)
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Shaping Language Models with Cognitive Insights
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations from PPO and GAE to the lines of code in the pytorch implementation
LMRax is a framework built on JAX to train transformers language models by reinforcement learning, along with the reward model training.
Summaries of papers related to the alignment problem in NLP
[TSMC] Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
Add a description, image, and links to the reinforcement-learning-from-human-feedback topic page so that developers can more easily learn about it.
To associate your repository with the reinforcement-learning-from-human-feedback topic, visit your repo's landing page and select "manage topics."