Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architectural thoughts/discussion #14

Open
founderblocks-sils opened this issue Apr 17, 2023 · 4 comments
Open

Architectural thoughts/discussion #14

founderblocks-sils opened this issue Apr 17, 2023 · 4 comments

Comments

@founderblocks-sils
Copy link
Contributor

I've been playing with this right now and we're definitely getting somewhere.

This blog post is a good summary, I'm guessing you know it: https://jina.ai/news/auto-gpt-unmasked-hype-hard-truths-production-pitfalls/

Now, autogpt allows subtasks to be executed at GPT 3.5 turbo, which is a lot faster and cheaper to execute. I believe that with a bit of prompt engineering that is possible at least for solving some subtasks here. (Prompt engineering in this case model specific, as 3.5 sometimes doesn't want to format responses nicely and machine readable. I also believe autogpt has a "json autofix" thing which adds missing parenthesis before parsing.)

Apart from that it would be interesting to explore, how we could reduce the number of needed steps to achieve small, autonomous tasks more quickly and cost effectively.

@lgrammel
Copy link
Owner

lgrammel commented Apr 17, 2023

First of all, thanks for all the great pull requests and feedback! This is super helpful!

Good article! This is also worth a read: https://huyenchip.com/2023/04/11/llm-engineering.html

Here are some of my thoughts on the topic:

  1. I'm trying to build for GPT-4, because it represents the power I expect in future LLMs (both local and by other providers), while at the same time trying to support as many models as possible (to have a sufficiently generalized framework). I've only added GPT-3.5 because not everyone has GPT-4 access yet. GPT-4 provides vastly superior results, and I expect the GPT-4 cost to come down a lot in the next 12 months (and its speed to increase). In particular for code-related tasks, I would strongly recommend not using GTP-3.5.

  2. The main problem with agents at the moment is that they can't even reliably complete tasks. This problem is a priority for me (compared to efficiency, cost reduction, speed, etc. which will be important later down the road). I expect the first agents to be more like interns that get handed tasks (so speed does not matter as much). Solving those tasks even for a few dollars is still way cheaper than hiring someone. Improving the robustness of agents will require a testing and experimentation framework (a/b testing, maybe some form of non-deterministic unit tests - not sure yet how that could look like), observability (logging agent executions, monitoring failures, potentially a/b testing in production), and probably other tools.

  3. That being said, there are a few easy tricks to improve GPT-3.5 performance. E.g. the JsonActionFormatter could try to parse the first JSON object that appears in the message, since gpt-3.5-turbo tends to put them into different places. Or several results could be returned (say n=3), and the first parseable result is used (increases the cost though). Btw, the JsonFormatter is not a good choice for coding, since code itself can contain JSON and things can get messy. I have a better formatter that uses special characters (not yet in the repo), and XML is another option.

Overall it's still early and I'm trying to get the project into a good shape before OpenAI gives GPT-4 access to everyone.

@founderblocks-sils
Copy link
Contributor Author

I tend to agree with most points. What I found however, is that the speed of GPT 3.5 turbo makes a huge difference for me as a developer and prompt engineer because it allows for much faster experimentation. So even though GPT 4 WILL be the default and fast standard, right now it does come with a hefty penalty making debugging and iterating quite slow.

So this could have a big impact for us as developers of these things.

Also some things are fundamental and model independent, e.g. how can we prompt the model and run tools to save steps (showing available files in case of a file not found is one example) and thus enable us to iterate faster on the tooling and the users to get results more effectively.

@lgrammel
Copy link
Owner

lgrammel commented Apr 18, 2023

100% on the last part. And GPT 3.5 is supported (plus I'll add support for the OpenAI text models soon).

Personally I have the same sense of rapid feedback and fast experimentation when I use GPT-3.5, but when I reflect on it, I tend to realize that even though it might feel fast and good, it probably is a case of walking faster in the wrong direction and solving soon-to-be-obsolete problems.

Therefore I want to make GPT-4 the primary target and would like to avoid any GPT-3.5 specific optimizations.

@lgrammel
Copy link
Owner

lgrammel commented Apr 18, 2023

I've added FlexibleJsonActionFormat, which should help a lot with GPT-3.5-turbo. GPT-3.5-turbo agents should work a lot better now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants