向量化数据库使用中的一些问题 #3991

hepj987 · 2024-05-11T06:59:32Z

请问一下我再初始化向量数据库的时候成功了，
脚本如下：python init_database.py --recreate-vs
其中使用了默认配置，只在model_config.py中修改了MODEL_ROOT_PATH = "/public/RAG/model"为我的本地模型路径，以及bge-large-zh-v1.5为我对应的保存路径
"bge-large-zh-v1.5": "AI-ModelScope/bge-large-zh"
初始化向量数据库显示成功：

LLM模型问答服务正常，但是在使用知识库的时候一直查询不到知识库。
对比官方的展示图片发现问题，在管理页面中虽然可以看到初始化的数据文件，但是源文件和向量库都是 × 而不是 √。不知道是哪里出了问题？

并且选中其中一个文件以后报如下错误：

请问大家在使用中是否遇到同样的问题，是怎么解决的

dosubot · 2024-05-11T07:00:44Z

It looks like you're facing two distinct issues with your vector database setup: the source files and vector library not being correctly recognized in the management interface, and a ValueError related to the paginate option in a Python script.

For the first issue regarding the source files and vector library not being recognized (marked with × instead of √), this problem might be due to incorrect file handling, configuration errors, or missing dependencies. Ensure that all required models and modules are correctly installed, including any specific sentence-transformers model and the modelscope module. Also, verify that your files are in the correct format (e.g., UTF-8 for Chinese characters) and that your system meets all requirements for handling these files properly.

Regarding the ValueError with the paginate option, this error typically indicates a mismatch in expected argument formats or values. Without specific details, it's challenging to provide a direct solution. However, you should review any database querying functions in your chatapi-master module, ensuring that pagination parameters are correctly used and passed in the expected format. If paginate is part of a library or framework, consulting the respective documentation for correct usage might also be beneficial.

If these suggestions don't resolve your issues, providing more specific error messages or details about where exactly the error occurs could help in offering more targeted advice.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

hepj987 · 2024-05-13T06:27:15Z

向量数据库添加文件失败只是显示问题，修改202 203行的替换操作以后结果显示正常。
但是还是存在向量数据库匹配不到的问题，更换vs.similarity_search_with_score_by_vector(embeddings, k=top_k, score_threshold=score_threshold)为docs=vs.similarity_search_with_score(query)打印可以看到分数都在100以上

dosubot · 2024-05-13T06:39:01Z

It seems you've made progress by addressing the display issue related to file addition in the vector database by modifying lines 202 and 203. However, the challenge of the vector database not matching queries as expected, with scores appearing unusually high, remains.

To tackle the issue of high scores and ensure your vector database matches queries correctly, consider these steps:

Ensure Proper Score Calculation and Normalization: It's crucial to check that your vectors are normalized correctly. Proper normalization affects scoring directly. Make sure that the normalization process is consistent for both the indexed data and the query vectors.
Adjust SCORE_THRESHOLD: If the scores are unexpectedly high, it might be beneficial to review and possibly adjust the score threshold. The threshold is meant to filter out results that do not meet a certain criterion, so ensuring it's set appropriately is key.
Verify Vector Database Configuration: Double-check that your vector database setup is correctly configured and compatible with your system. An incorrect setup can lead to various issues, including scoring anomalies.
Validate Data and Query Vector Generation: Ensure that the vectors for your data and queries are generated accurately. Any inconsistencies in vector generation can lead to mismatches and scoring issues.
Update or Debug the Vector Database Library: If you're using an external library for vector operations, make sure it's up to date and compatible with your system. Sometimes, updating or debugging the library can resolve unexpected issues.
Consult Documentation or Support: For more complex issues, referring to the documentation or seeking developer support for the vector database or related tools might provide specific solutions or insights.

These steps should help you address the issue of high scores and improve the matching accuracy of your vector database. If you continue to encounter problems, providing more detailed information about your setup and the specific challenges you're facing could help in offering more targeted advice.

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Zhaitizi · 2024-05-21T02:02:26Z

同样问题，一直不知道大家说的202.203行在哪里，找不到knowledge_base.py文件，求大佬们指点

ZG688 · 2024-05-22T15:10:14Z

大佬解决了吗？同样显示×知识库问答检索不到文档

ZG688 · 2024-05-22T15:11:21Z

同样问题，一直不知道大家说的202.203行在哪里，找不到knowledge_base.py文件，求大佬们指点
langchain-ChatGLM\webui_pages\knowledge_base 这个目录下，文档搜一下也能搜到

zixiaotan21 · 2024-05-26T08:35:06Z

同样有这个问题，想问问vs.similarity_search_with_score_by_vector(embeddings, k=top_k, score_threshold=score_threshold)为docs=vs.similarity_search_with_score(query) 这行代码是在哪个文件下面呀，谢谢！

hepj987 added the bug Something isn't working label May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

向量化数据库使用中的一些问题 #3991

向量化数据库使用中的一些问题 #3991

hepj987 commented May 11, 2024

dosubot bot commented May 11, 2024 •

edited

Details

hepj987 commented May 13, 2024

dosubot bot commented May 13, 2024 •

edited

Details

Zhaitizi commented May 21, 2024

ZG688 commented May 22, 2024

ZG688 commented May 22, 2024

zixiaotan21 commented May 26, 2024

向量化数据库使用中的一些问题 #3991

向量化数据库使用中的一些问题 #3991

Comments

hepj987 commented May 11, 2024

dosubot bot commented May 11, 2024 • edited

Details

hepj987 commented May 13, 2024

dosubot bot commented May 13, 2024 • edited

Details

Zhaitizi commented May 21, 2024

ZG688 commented May 22, 2024

ZG688 commented May 22, 2024

zixiaotan21 commented May 26, 2024

dosubot bot commented May 11, 2024 •

edited

dosubot bot commented May 13, 2024 •

edited