feedback_learning

Classical RL algorithms can only use rewards as information how to improve. This is often very inefficient. However, the environment often provides useful feedback on what has been done wrong. This can be, for example, explicit natural language feedback by a human. By using this complex feedback the training speed can be greatly improved.

Learning to predict bit sequence with and without binary error feedback

experiments/train_single_step_dreamer.py was used to run this experiment. The number of runs with different random seeds, the log directory and whether to use feedback observations can be changed in this file. The experiments/logs/make_sequence_guessing_figure.py script was used to create the training progress figures in pdf format. It produces one pdf file for every sequence length range. So one is the success rate on guessing sequences of length between 1 and 15 bits and so on.

Open-Loop Dreamer

experiments/train_open_loop_dreamer contains the script that trains the open-loop Dreamer algororithm on the Pendulum-v0 environment. The results can be visualized with the experiments/make_open_loop_dreamer_figure.py script.

Simple Feedback Reacher

This part is not yet finished. The environment code is in learning_from_feedback/envs/simple_feedback_reacher. There is one training script that trains a simplified dreamer algorithm on this environment in experiments/simple_feedback_reacher/train_single_step_dreamer.py. Another training script in experiments/simple_feedback_reacher/train_state_dreamer.py trains a more complete Dreamer agent.

Literature List

Hierarchical RL

A framework for temporal abstraction in reinforcement learning (One of the first papers in Hierarchical RL)
Feudal Networks for Hierarchical Reinforcement Learning
Hierarchical Skills for Efficient Exploration
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Model based RL

Dream to Control: Learning Behaviors by Latent Imagination (Introduces the Dreamer algorithm)
Mastering Atari with Discrete World Models (Improved Dreamer algorithm for discrete control)
Learning Continuous Control Policies by Stochastic Value Gradients
End-to-End Differentiable Physics for Learning and Control
Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments (Old paper by Schmidhuber on using learned enviroment models to calculate policy gradients)

Future Work

Improve performance on the simple_feedback_reacher environment

The environment model still takes a very long time to train. Maybe it is necessary to use a specialized network architecture instead of just fully connected layers.

Implement more general robotics environment with natural language instructions and feedback

The instruction might be "pick up the bottle". In case of an unsuccessfull attempt the feedback could be "The bottle is the green object on the right". The idea is that it is easier for the agent to learn what a green object is than to learn what a bottle is.

Develop hierarchical RL algorithm

The Dreamer algorithm has to predict at least T steps into the future when rewards are delayed by up to T steps. A hierarchical RL algorithm might be able to improve a policy in a fraction of T steps.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
experiments		experiments
learning_from_feedback		learning_from_feedback
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
who-did-what.txt		who-did-what.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experiments

experiments

learning_from_feedback

learning_from_feedback

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

who-did-what.txt

who-did-what.txt

Repository files navigation

feedback_learning

Learning to predict bit sequence with and without binary error feedback

Open-Loop Dreamer

Simple Feedback Reacher

Literature List

Hierarchical RL

Model based RL

Future Work

Improve performance on the simple_feedback_reacher environment

Implement more general robotics environment with natural language instructions and feedback

Develop hierarchical RL algorithm

About

Releases

Packages

Languages

AlexanderKoch-Koch/learning_from_feedback

Folders and files

Latest commit

History

Repository files navigation

feedback_learning

Learning to predict bit sequence with and without binary error feedback

Open-Loop Dreamer

Simple Feedback Reacher

Literature List

Hierarchical RL

Model based RL

Future Work

Improve performance on the simple_feedback_reacher environment

Implement more general robotics environment with natural language instructions and feedback

Develop hierarchical RL algorithm

About

Resources

Stars

Watchers

Forks

Languages