Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for using retriever with agents #795

Open
tom-leamon opened this issue May 2, 2024 · 4 comments
Open

Add example for using retriever with agents #795

tom-leamon opened this issue May 2, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@tom-leamon
Copy link

tom-leamon commented May 2, 2024

Currently, there are no examples in the documentation which illustrate how to use retrievers with agents in order to leverage expanded context through embeddings. It's not immediately clear if this is even possible, though the types suggest it is.

If it's not currently available, implementing this feature would be hugely beneficial in increasing the performance of agents.

@marcusschiesser
Copy link
Collaborator

@tom-leamon, you can use a QueryEngineTool in your agent, see https://github.com/run-llama/LlamaIndexTS/blob/main/examples/agent/query_openai_agent.ts

@himself65 himself65 added the documentation Improvements or additions to documentation label May 2, 2024
@tom-leamon
Copy link
Author

That does work, but it seems to take significantly longer to use the tool and respond compared to using a ContextChatEngine. Is this a limitation of function calling APIs? Or is there a way avoid tool use but still perform retrieval?

@tom-leamon
Copy link
Author

tom-leamon commented May 3, 2024

@himself65 @marcusschiesser is it possible to have agents use a retriever without a QueryEngineTool?

With ContextChatEngine the model responds very quickly, utilizing the retriever to return relevant data from context. However, when using an agent and QueryEngineTool, the agent only has access to that retriever when using the tool, which not only adds significant latency, but means multiple tool uses are needed, such as in a scenario where the agent needs to be aware of its context and perform a web search.

If the agent could use the retriever without an additional tool call, like ContextChatEngine, it could more than double the performance of the agent. In my use case all interactions with the AI are highly contextual (what space is active, what group is active, what channel is active, what thread is active, and all the data already in these organizational units). With ContextChatEngine every single prompt always takes into consideration this context. With a QueryEngineTool, the base model first needs to even decide if it needs to perform a retrieval, which introduces latency. Many times, it will decide a retrieval is not needed, even though it would have vastly improved the contextual relevance and therefore quality of the answer.

Without this capability I am forced to have users manually choose if they want an agent or not, depending on if they need it to perform tool use (like searching the web) or provide the highest quality answer in the shortest time. I would prefer to have a single paradigm that both always has full context and can use tools.

Is this something that is already possible, or perhaps could be added to the roadmap?

@marcusschiesser
Copy link
Collaborator

@tom-leamon to reduce the latency, you can use a tool that directly calls the retriever.

I added an example to the examples folder, see https://github.com/run-llama/LlamaIndexTS/blob/main/examples/agent/retriever_openai_agent.ts (compare with the same example using the query engine tool: https://github.com/run-llama/LlamaIndexTS/blob/main/examples/agent/query_openai_agent.ts)

The differences are:

  1. QueryEngineTool uses a query engine (this means an additional LLM call that increases your latency) to generate a result based on the retrieved context that is then used by the agent
  2. The retriever tool just retrieves the context and sends it as tool output to the agent

You can see the difference at run-time by adding the verbose: true parameter.

In the simple use case, both approaches lead to the same result - Would be great to get some feedback from your use case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants