Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add user toggleable web search #2004

Merged
merged 31 commits into from
May 27, 2024

Conversation

cheahjs
Copy link
Contributor

@cheahjs cheahjs commented May 6, 2024

Pull Request Checklist

  • Description: Briefly describe the changes in this pull request.
  • Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
  • Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
  • Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
  • Testing: Have you written and run sufficient tests for the changes?
  • Code Review: Have you self-reviewed your code and addressed any coding standard issues?

Description

Adds the ability to perform web searches via the RAG API (rag/api/v1/websearch) using the following search providers:

  1. SearXNG, configured with SEARXNG_QUERY_URL, eg SEARXNG_QUERY_URL=https://search.projectsegfau.lt/search?q=<query>
  2. Google Programmatic Search Engine, configured with GOOGLE_PSE_API_KEY and GOOGLE_PSE_ENGINE_ID
  3. Brave Search, configured with BRAVE_SEARCH_API_KEY
  4. serpstack - Google proxy, configured with SERPSTACK_API_KEY (and an optional SERPSTACK_HTTPS=false, since the free tier doesn't allow for HTTPS connections)
  5. serper - Google proxy, configured with SERPER_API_KEY.

Users can configure how many of the top search results to crawl with RAG_WEB_SEARCH_RESULT_COUNT, and RAG_WEB_SEARCH_CONCURRENT_REQUESTS controls the number of concurrent requests made to crawl the search results.

Users can toggle web search on or off on the UI, which causes the frontend to use a prompt to generate a search query, calling the RAG API to search for that query, and then injecting the results of that as a RAG document.

chrome_Cmvaz6Drqg.mp4

Implements #586


Changelog Entry

Added

  • 🔍Web Search for RAG: You can now perform web searches when chatting using providers like SearXNG, Google PSE, Brave Search, serpstack, and serper.

fix: google PSE endpoint uses GET

fix: google PSE returns link, not url

fix: serper wrong field
@que-nguyen
Copy link
Contributor

Great!

"Accept-Encoding": "gzip",
"X-Subscription-Token": api_key,
}
params = {"q": query, "count": 5}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use an env var to configure this?

@tjbck tjbck marked this pull request as draft May 6, 2024 23:24
@cheahjs cheahjs changed the title feat: add web search to backend APIs feat: add user toggleable web search May 12, 2024
@cheahjs cheahjs force-pushed the feat/backend-web-search branch 2 times, most recently from fc00b32 to 724b13e Compare May 12, 2024 12:45
this is to handle when we have multiple models selected or regenerate a
response, it'll only add it to the model's response and not add dupes on
the user message
@fhasbrook
Copy link

So excited for this to be working, thank you for your hard work! I got ahead of myself and grabbed your fork to play with it, but I'm having problems figuring out the correct environment string to throw it for searxng to work properly.

@cheahjs cheahjs marked this pull request as ready for review May 20, 2024 18:53
@ProjectMoon
Copy link

Pulled this down to test it out! It works very well for the most part. Here are some observations I've noticed while using it with a SearxNG instance.

  • It will sometimes generate strange queries to search for. This is very noticeable when you have a markdown table in the previous messages (e.g. if you asked it to generate a table in responses prior to searching). I've seen it search for column headers in the table. A query like this Name | Description | Status for example.
  • It will cause some models to generate nonsense responses. Not sure if this is is due to the models themselves or the search function, or me running out of context space. Specifically I was getting weird responses sometimes (not all times) using this model: https://huggingface.co/TheBloke/Mixtral_11Bx2_MoE_19B-GGUF

@ProjectMoon
Copy link

Here are examples of the more interesting issues with the model I mentioned above. It should be noted it worked fine with Llama 3.

Screenshot_20240525-131810
Screenshot_20240525-131855

@ProjectMoon
Copy link

ProjectMoon commented May 26, 2024

So I think the garbage output in my second screenshot is probably more the model and its template than any potential issue with the search functionality. But maybe there should be some adjustments when handling follow-up queries.

Example:

  1. Turn on search button, then ask something like "What are the opening times of groceries stores near ?"
  2. This search will work fine, and you'll likely get a coherent result.
  3. But then if you leave the search button on and ask "Which of these stores is the closest one?", things will start to get a bit funny, due to how the search queries are generated.

Maybe automatically toggling search off after a successful query would be a good first step, as otherwise Open WebUI keeps using each message to generate search queries. At least for me, the generally expected behavior would be that any follow-up questions use the existing search results.

A more advanced way of handling it would be to leave search enabled, but indicate to the user that it won't necessarily be used unless more refinement is needed. This requires identifying follow-up questions and asking the LLM whether or not it thinks another web search is necessary.

But so far the experience is very polished! I like it and hope it's merged soon.

@skyler14
Copy link

Pull Request Checklist

* [x]  **Description:** Briefly describe the changes in this pull request.

* [ ]  **Changelog:** Ensure a changelog entry following the format of [Keep a Changelog](https://keepachangelog.com/) is added at the bottom of the PR description.

* [ ]  **Documentation:** Have you updated relevant documentation [Open WebUI Docs](https://github.com/open-webui/docs), or other documentation sources?

* [ ]  **Dependencies:** Are there any new dependencies? Have you updated the dependency versions in the documentation?

* [x]  **Testing:** Have you written and run sufficient tests for the changes?

* [ ]  **Code Review:** Have you self-reviewed your code and addressed any coding standard issues?

Description

Adds the ability to perform web searches via the RAG API (rag/api/v1/websearch) using the following search providers:

1. [SearXNG](https://searx.space/), configured with `SEARXNG_QUERY_URL`, eg `SEARXNG_QUERY_URL=https://search.projectsegfau.lt/search?q=<query>`

2. [Google Programmatic Search Engine](https://programmablesearchengine.google.com/about/), configured with `GOOGLE_PSE_API_KEY` and `GOOGLE_PSE_ENGINE_ID`

3. [Brave Search](https://brave.com/search/api/), configured with `BRAVE_SEARCH_API_KEY`

4. [serpstack](https://serpstack.com/) - Google proxy, configured with `SERPSTACK_API_KEY` (and an optional `SERPSTACK_HTTPS=false`, since the free tier doesn't allow for HTTPS connections)

5. [serper](https://serper.dev) - Google proxy, configured with `SERPER_API_KEY`.

Users can configure how many of the top search results to crawl with RAG_WEB_SEARCH_RESULT_COUNT, and RAG_WEB_SEARCH_CONCURRENT_REQUESTS controls the number of concurrent requests made to crawl the search results.

Users can toggle web search on or off on the UI, which causes the frontend to use a prompt to generate a search query, calling the RAG API to search for that query, and then injecting the results of that as a RAG document.
chrome_Cmvaz6Drqg.mp4

Implements #586

Changelog Entry

Added

* **🔍Web Search for RAG**: You can now perform web searches when chatting using providers like SearXNG, Google PSE, Brave Search, serpstack, and serper.

I believe Duck Duck Go offers a free search API as well so might be worth adding them, but I wouldn't delay merging this PR to do that

@tjbck tjbck added this to the v0.2.0 milestone May 26, 2024
@tjbck tjbck changed the base branch from dev to websearch May 27, 2024 06:38
@tjbck tjbck merged commit f1a7c76 into open-webui:websearch May 27, 2024
@tjbck tjbck mentioned this pull request May 27, 2024
@spammenotinoz
Copy link

Thank-you, this is interesting and working quite well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants