Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally use Azure for webarena evaluation #2626

Merged
merged 5 commits into from
May 23, 2024

Conversation

peterychang
Copy link
Collaborator

Allows use of Azure endpoint for webarena evaluation.

One remaining issue that could be done in another PR: Startup checks for the existence of an openai key, even if its never used

@peterychang peterychang requested a review from afourney May 8, 2024 18:50
@@ -292,12 +292,17 @@ def response_preparer(inner_messages):
cdp_session = context.new_cdp_session(page)
config_file = "full_task.json"

import nltk

nltk.download("punkt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. I believe we've got code in the Dockerfile to pre-fetch this. As long as it doesn't download again, it should be ok.

Copy link
Member

@afourney afourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR we will want to make sure the temperature and top-p of the Azure code matches what's used here.

@afourney afourney merged commit 6d8b0f9 into microsoft:ct_webarena May 23, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants