Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Reward Calculation in example/2022-12-10-textrl-elon-musk.ipynb #27

Open
Alanhsiu opened this issue May 8, 2024 · 1 comment
Open

Comments

@Alanhsiu
Copy link
Contributor

Alanhsiu commented May 8, 2024

In the notebook example/2022-12-10-textrl-elon-musk.ipynb, the reward calculation in the MyRLEnv class should be updated for correct scoring. Specifically, the function get_reward needs modification.

Current Code:

class MyRLEnv(TextRLEnv):
    def get_reward(self, input_item, predicted_list, finish):
        reward = 0
        if finish or len(predicted_list) >= self.env_max_length:
            predicted_text = tokenizer.convert_tokens_to_string(predicted_list[0])
            # sentiment classifier
            reward = sentiment(input_item[0] + predicted_text)[0][0]['score'] * 10
        return reward

The current code concatenates input_item[0] with the predicted text to calculate the sentiment score. However, input_item should be referenced differently to ensure proper reward calculation.

reward = sentiment(input_item['input'] + predicted_text)[0][0]['score'] * 10
@voidful
Copy link
Owner

voidful commented May 8, 2024

feel free to submit a PR to fix this issue~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants