Support return final task states for evaluation #1755

xingyaoww · 2024-05-13T08:52:06Z

This is part of the modification in #1468.

It allows main to return the final task state (with trajectory) to be saved by the eval script for further evaluation.
It allows the user to provide fake_user_response_fn to fake user response based on the State to the model during the main loop. (We probably need to figure out a way to extend our existing integration test to test this, cc @li-boxuan)

yufansong · 2024-05-13T10:31:21Z

opendevin/controller/agent_controller.py

+            except Exception as e:
                logger.error('Error in loop', exc_info=True)
+                self.finish_state = self.get_state()
+                self.finish_state.error = str(e)


Want to get clear about our logic, do we need the state after every step or only need the one after the final step?

We need one after the final step -- in this case when error occurs, it will break the loop, hence end the ._run function. But i guess you caught a bug - we shouldn't re-get the state at the end of the lopp

should be hopefully fixed in the new commit

yufansong · 2024-05-13T10:53:43Z

opendevin/core/main.py

+                if fake_user_response_fn is None:
+                    message = input('Request user input >> ')
+                else:
+                    message = fake_user_response_fn(controller.get_state())


I think add fake_user_response_fn into main function signature will cause test code and normal logic get coupled. But I also didn't find a good way to decouple it and mock the test. I prefer to merge it first and optimize it in the future. cc @li-boxuan Do you have any good idea?

This isn't for testing purpose - I believe it's for SWE-Bench evaluation, see here.

Looking at that giant PR, it looks like we could make this prompt as a built-in one?

'Please continue working on the task on whatever approach you think is suitable.\n'
'If you think you have modified the code in a way that fixes the issue, please run the following command: <execute_bash> exit </execute_bash>.\n'
'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP OR USE THE INTERNET TO SOLVE THIS TASK.\n'

The agent could have a full-autonomous mode so that it doesn't ask for user inputs, and instead, it uses the above prompt. @xingyaoww what do you think? I could help on that if you need a hand.

@li-boxuan That's actually a great idea!! Fully support that -- I can help starts a PR to isolate that out - but the issue would be, do we want to implement this in the controller, or inside codeact agent like the count down?

Ooh cool! The other agents have only "fully autonomous" mode currently, and need interactive mode, CodeAct the other way around. Maybe an option to set on the agent when run?

Like agent.mode=interactive or agent.mode=autonomous. It is the agent who picks the prompt to use, isn't it?

I tend to think the prompt itself will be in the agent itself but there should be an interface/api call from controller such that any agent can leverage this feature.

Well i decided to keep it a separate PR after we got the SWE-Bench in -- supporting this now probably takes more time :(

Added an issue to track this: #1798

Sure, impelement feature first and refactor it when we have time, we also can help. :)

yufansong

General looks good to me. Want to hear other maintainers opinion about the test change.

rbren

Why can't we just call controller.get_state() once it's finished?

Double-tracking final_state and current_state seems like an anti-patern

xingyaoww · 2024-05-13T15:55:39Z

@rbren The major issue of that is, when controller sets agent to other states, it completely reset the state for a new task (all histories are removed). We need to store the state before it got reseted, or we need to mess around with the set agent state which could be more complex.

rbren · 2024-05-13T23:22:37Z

@xingyaoww I ran into this problem too recently. My recent PR refactored this a bit--I've got another one around that changes the reset_task behavior to avoid this problem.

Can you try simply removing the call to reset_task when AgentState is ERROR? I think that might do the trick

li-boxuan · 2024-05-14T03:17:27Z

@xingyaoww I ran into this problem too recently. My recent PR refactored this a bit--I've got another one around that changes the reset_task behavior to avoid this problem.

Can you try simply removing the call to reset_task when AgentState is ERROR? I think that might do the trick

I ran into this problem (sort of, but non-blocking) too, when I worked on #1735. Would be nice to see an elegant solution!

xingyaoww · 2024-05-14T10:49:15Z

Can you try simply removing the call to reset_task when AgentState is ERROR? I think that might do the trick

@rbren but when state is FINISHED, the state would still get removed by reset_task hence not accessible for return value? Can you share with me what you changed in that refractor?

…oller-return-final-state

xingyaoww

@rbren @li-boxuan @yufansong feel free to take another look! This should be the final PR before the SWE-Bench integration!

tests/integration/test_agent.py

rbren · 2024-05-15T02:01:31Z

opendevin/core/main.py

+    final_state = await controller.close()
+    return final_state


controller.close() doesn't seem to return a state, and I don't think we want it to. Maybe you just mean return controller.state?

Fixed! How about now!

rbren

Thanks! This is much cleaner

opendevin/core/main.py

xingyaoww added 2 commits May 13, 2024 16:33

support returning states at the end of controller

e95a3d3

remove return None

d853b32

yufansong reviewed May 13, 2024

View reviewed changes

fix issue of overriding final state

f59128d

yufansong reviewed May 13, 2024

View reviewed changes

Merge branch 'main' into xw/controller-return-final-state

e621d25

yufansong approved these changes May 13, 2024

View reviewed changes

rbren requested changes May 13, 2024

View reviewed changes

li-boxuan mentioned this pull request May 14, 2024

Integration test: Verify finish state & add auto-rerun in regenerate.sh #1773

Merged

neubig assigned xingyaoww May 14, 2024

xingyaoww added 10 commits May 14, 2024 18:53

Merge branch 'main' into xw/controller-return-final-state

ba5dbfd

Merge commit 'e4460a974d8f4a97eb5d9551d69643976156798c' into xw/contr…

1d568c2

…oller-return-final-state

return the final state on close

253ae3f

merge AgentState with State

48af8dc

fix integration test

9d43744

add ChangeAgentStateAction to history in attempt to fix integration

81f4379

Merge branch 'main' into xw/controller-return-final-state

a78d4ca

add back set agent state

f612aad

update tests

9858f75

update tests

87d4466

xingyaoww commented May 14, 2024

View reviewed changes

tests/integration/test_agent.py Show resolved Hide resolved

xingyaoww mentioned this pull request May 14, 2024

Support fully autonomous mode for agent #1798

Open

xingyaoww added 2 commits May 15, 2024 06:29

Merge branch 'main' into xw/controller-return-final-state

8ce7f23

Merge branch 'main' into xw/controller-return-final-state

8c5d941

rbren reviewed May 15, 2024

View reviewed changes

directly return get state

63434c4

rbren approved these changes May 15, 2024

View reviewed changes

add back the missing .close()

155fef5

li-boxuan reviewed May 15, 2024

View reviewed changes

opendevin/core/main.py Outdated Show resolved Hide resolved

Update typo in opendevin/core/main.py

366f2a8

li-boxuan approved these changes May 15, 2024

View reviewed changes

xingyaoww enabled auto-merge (squash) May 15, 2024 03:38

xingyaoww merged commit d1fd277 into main May 15, 2024
25 checks passed

xingyaoww deleted the xw/controller-return-final-state branch May 15, 2024 03:43

li-boxuan mentioned this pull request May 15, 2024

feat: make other agents support asking user input in MessageAction. #1777

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support return final task states for evaluation #1755

Support return final task states for evaluation #1755

xingyaoww commented May 13, 2024

yufansong May 13, 2024 •

edited

xingyaoww May 13, 2024

xingyaoww May 13, 2024

yufansong May 13, 2024

li-boxuan May 14, 2024

xingyaoww May 14, 2024

enyst May 14, 2024

li-boxuan May 14, 2024

xingyaoww May 14, 2024 •

edited

yufansong May 14, 2024

yufansong left a comment

rbren left a comment

xingyaoww commented May 13, 2024 •

edited

rbren commented May 13, 2024

li-boxuan commented May 14, 2024

xingyaoww commented May 14, 2024 •

edited

xingyaoww left a comment

rbren May 15, 2024

xingyaoww May 15, 2024

rbren left a comment

Support return final task states for evaluation #1755

Support return final task states for evaluation #1755

Conversation

xingyaoww commented May 13, 2024

yufansong May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingyaoww May 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yufansong left a comment

Choose a reason for hiding this comment

rbren left a comment

Choose a reason for hiding this comment

xingyaoww commented May 13, 2024 • edited

rbren commented May 13, 2024

li-boxuan commented May 14, 2024

xingyaoww commented May 14, 2024 • edited

xingyaoww left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbren left a comment

Choose a reason for hiding this comment

yufansong May 13, 2024 •

edited

xingyaoww May 14, 2024 •

edited

xingyaoww commented May 13, 2024 •

edited

xingyaoww commented May 14, 2024 •

edited