Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

向量化数据库使用中的一些问题 #3991

Open
hepj987 opened this issue May 11, 2024 · 7 comments
Open

向量化数据库使用中的一些问题 #3991

hepj987 opened this issue May 11, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@hepj987
Copy link

hepj987 commented May 11, 2024

请问一下我再初始化向量数据库的时候成功了,
脚本如下:python init_database.py --recreate-vs
其中使用了默认配置,只在model_config.py中修改了MODEL_ROOT_PATH = "/public/RAG/model"为我的本地模型路径,以及bge-large-zh-v1.5为我对应的保存路径
"bge-large-zh-v1.5": "AI-ModelScope/bge-large-zh"
初始化向量数据库显示成功:
image

LLM模型问答服务正常,但是在使用知识库的时候一直查询不到知识库。
对比官方的展示图片发现问题,在管理页面中虽然可以看到初始化的数据文件,但是源文件和向量库 都是 × 而不是 √。不知道是哪里出了问题?
image
并且选中其中一个文件以后报如下错误:
image

请问大家在使用中是否遇到同样的问题,是怎么解决的

@hepj987 hepj987 added the bug Something isn't working label May 11, 2024
Copy link

dosubot bot commented May 11, 2024

It looks like you're facing two distinct issues with your vector database setup: the source files and vector library not being correctly recognized in the management interface, and a ValueError related to the paginate option in a Python script.

For the first issue regarding the source files and vector library not being recognized (marked with × instead of √), this problem might be due to incorrect file handling, configuration errors, or missing dependencies. Ensure that all required models and modules are correctly installed, including any specific sentence-transformers model and the modelscope module. Also, verify that your files are in the correct format (e.g., UTF-8 for Chinese characters) and that your system meets all requirements for handling these files properly.

Regarding the ValueError with the paginate option, this error typically indicates a mismatch in expected argument formats or values. Without specific details, it's challenging to provide a direct solution. However, you should review any database querying functions in your chatapi-master module, ensuring that pagination parameters are correctly used and passed in the expected format. If paginate is part of a library or framework, consulting the respective documentation for correct usage might also be beneficial.

If these suggestions don't resolve your issues, providing more specific error messages or details about where exactly the error occurs could help in offering more targeted advice.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@hepj987
Copy link
Author

hepj987 commented May 13, 2024

向量数据库添加文件失败只是显示问题,修改202 203行的替换操作以后 结果显示正常。
但是还是存在向量数据库匹配不到的问题,更换vs.similarity_search_with_score_by_vector(embeddings, k=top_k, score_threshold=score_threshold)为docs=vs.similarity_search_with_score(query)打印可以看到分数都在100以上
image

Copy link

dosubot bot commented May 13, 2024

It seems you've made progress by addressing the display issue related to file addition in the vector database by modifying lines 202 and 203. However, the challenge of the vector database not matching queries as expected, with scores appearing unusually high, remains.

To tackle the issue of high scores and ensure your vector database matches queries correctly, consider these steps:

  1. Ensure Proper Score Calculation and Normalization: It's crucial to check that your vectors are normalized correctly. Proper normalization affects scoring directly. Make sure that the normalization process is consistent for both the indexed data and the query vectors.

  2. Adjust SCORE_THRESHOLD: If the scores are unexpectedly high, it might be beneficial to review and possibly adjust the score threshold. The threshold is meant to filter out results that do not meet a certain criterion, so ensuring it's set appropriately is key.

  3. Verify Vector Database Configuration: Double-check that your vector database setup is correctly configured and compatible with your system. An incorrect setup can lead to various issues, including scoring anomalies.

  4. Validate Data and Query Vector Generation: Ensure that the vectors for your data and queries are generated accurately. Any inconsistencies in vector generation can lead to mismatches and scoring issues.

  5. Update or Debug the Vector Database Library: If you're using an external library for vector operations, make sure it's up to date and compatible with your system. Sometimes, updating or debugging the library can resolve unexpected issues.

  6. Consult Documentation or Support: For more complex issues, referring to the documentation or seeking developer support for the vector database or related tools might provide specific solutions or insights.

These steps should help you address the issue of high scores and improve the matching accuracy of your vector database. If you continue to encounter problems, providing more detailed information about your setup and the specific challenges you're facing could help in offering more targeted advice.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@Zhaitizi
Copy link

同样问题,一直不知道大家说的202.203行在哪里,找不到knowledge_base.py文件,求大佬们指点

@ZG688
Copy link

ZG688 commented May 22, 2024

大佬解决了吗?同样显示×知识库问答检索不到文档

@ZG688
Copy link

ZG688 commented May 22, 2024

同样问题,一直不知道大家说的202.203行在哪里,找不到knowledge_base.py文件,求大佬们指点
langchain-ChatGLM\webui_pages\knowledge_base 这个目录下,文档搜一下也能搜到

@zixiaotan21
Copy link

同样有这个问题,想问问vs.similarity_search_with_score_by_vector(embeddings, k=top_k, score_threshold=score_threshold)为docs=vs.similarity_search_with_score(query) 这行代码是在哪个文件下面呀,谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants