Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: RPC error: [batch_insert], <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 131239, max length: 65535)>, <Time:{'RPC start': '2024-05-11 14:17:48.677254', 'RPC error': '2024-05-11 14:17:48.678019'}> #3995

Open
zmwstu opened this issue May 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@zmwstu
Copy link

zmwstu commented May 12, 2024

2024-05-11 14:17:48,678 - decorators.py[line:146] - ERROR: RPC error: [batch_insert], <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 131239, max length: 65535)>, <Time:{'RPC start': '2024-05-11 14:17:48.677254', 'RPC error': '2024-05-11 14:17:48.678019'}>
2024-05-11 14:17:48,678 - milvus.py[line:595] - ERROR: Failed to insert batch starting at entity: 4000/11304
Traceback (most recent call last):
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/init_database.py", line 113, in
folder2db(kb_names=args.kb_name, mode="increment", embed_model=args.embed_model)
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/migrate.py", line 150, in folder2db
files2vs(kb_name, kb_files)
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs
kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True)
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc
doc_infos = self.do_add_doc(docs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/Code_Program/Chatchat/milvus-Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 84, in do_add_doc
ids = self.milvus.add_documents(docs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 119, in add_documents
return self.add_texts(texts, metadatas, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 598, in add_texts
raise e
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 592, in add_texts
res = self.col.insert(insert_list, timeout=timeout, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/orm/collection.py", line 500, in insert
return conn.batch_insert(
^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/decorators.py", line 147, in handler
raise e from e
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/decorators.py", line 143, in handler
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/decorators.py", line 182, in handler
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/decorators.py", line 122, in handler
raise e from e
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/decorators.py", line 87, in handler
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 575, in batch_insert
raise err from err
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 558, in batch_insert
request = self._prepare_batch_insert_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py", line 542, in _prepare_batch_insert_request
else Prepare.batch_insert_param(collection_name, entities, partition_name, fields_info)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/prepare.py", line 531, in batch_insert_param
return cls._parse_batch_request(request, entities, fields_info, location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/prepare.py", line 507, in _parse_batch_request
field_data = entity_helper.entity_to_field_data(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/entity_helper.py", line 374, in entity_to_field_data
entity_to_str_arr(entity, field_info, CHECK_STR_ARRAY)
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/entity_helper.py", line 236, in entity_to_str_arr
return convert_to_str_array(entity.get("values", []), field_info, check=check)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/entity_helper.py", line 231, in convert_to_str_array
check_str_arr(arr, max_len)
File "/home/zwm/miniconda3/envs/Langchain-Chatchat2/lib/python3.11/site-packages/pymilvus/client/entity_helper.py", line 216, in check_str_arr
raise ParamError(
pymilvus.exceptions.ParamError: <ParamError: (code=1, message=invalid input, length of string exceeds max length. length: 131239, max length: 65535)>
我的jsonl文件只有15M,在进行初始化数据库的时候,出现这种情况。

@zmwstu zmwstu added the bug Something isn't working label May 12, 2024
Copy link

dosubot bot commented May 12, 2024

The error you're encountering is due to a string field in your documents exceeding Milvus's maximum allowed length of 65535 characters, with a string length of 131239 being reported. This issue arises when attempting to insert documents into a Milvus collection where at least one string exceeds this limit.

To resolve this, you can implement a check and truncate strategy before insertion. This involves modifying your document processing logic to ensure no string field exceeds the maximum length allowed by Milvus. Here's a concise example of how you might adjust your document processing to truncate longer strings:

MAX_LENGTH = 65535  # Maximum length allowed for string fields in Milvus

for doc in docs:
    for k, v in doc.metadata.items():
        str_v = str(v)
        # Truncate strings to MAX_LENGTH if they exceed it
        if len(str_v) > MAX_LENGTH:
            doc.metadata[k] = str_v[:MAX_LENGTH]

This code snippet should be integrated into your document preparation process, ensuring that all string fields are within the allowed length before attempting to insert them into Milvus. This approach helps avoid the "length of string exceeds max length" error by preemptively addressing the root cause.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@zmwstu
Copy link
Author

zmwstu commented May 12, 2024

The error you're encountering is due to a string field in your documents exceeding Milvus's maximum allowed length of 65535 characters, with a string length of 131239 being reported. This issue arises when attempting to insert documents into a Milvus collection where at least one string exceeds this limit.你遇到的错误是由于文档中的字符串字段超出了 Milvus 允许的最大长度 65535 个字符,并且报告的字符串长度为 131239。当尝试将文档插入到至少一个字符串超过此限制的 Milvus 集合中时,会出现此问题。

To resolve this, you can implement a check and truncate strategy before insertion. This involves modifying your document processing logic to ensure no string field exceeds the maximum length allowed by Milvus. Here's a concise example of how you might adjust your document processing to truncate longer strings:要解决此问题,可以在插入之前实施检查和截断策略。这涉及到修改文档处理逻辑,确保没有字符串字段超过 Milvus 允许的最大长度。下面是一个简明的示例,说明如何调整文档处理以截断较长的字符串:

MAX_LENGTH = 65535  # Maximum length allowed for string fields in Milvus

for doc in docs:
    for k, v in doc.metadata.items():
        str_v = str(v)
        # Truncate strings to MAX_LENGTH if they exceed it
        if len(str_v) > MAX_LENGTH:
            doc.metadata[k] = str_v[:MAX_LENGTH]

This code snippet should be integrated into your document preparation process, ensuring that all string fields are within the allowed length before attempting to insert them into Milvus. This approach helps avoid the "length of string exceeds max length" error by preemptively addressing the root cause.这个代码片段应该集成到你的文档准备过程中,确保所有字符串字段都在允许的长度范围内,然后再尝试将它们插入到 Milvus 中。此方法通过抢先解决根本原因,有助于避免“字符串长度超过最大长度”错误。

Details 详

Sources 来源

* [server/knowledge_base/kb_service/milvus_kb_service.py服务器/knowledge_base/kb_service/milvus_kb_service.py](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/server/knowledge_base/kb_service/milvus_kb_service.py)

* [knowledge_base/samples/content/test_files/langchain-ChatGLM_open.jsonl](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/knowledge_base/samples/content/test_files/langchain-ChatGLM_open.jsonl)

About Dosu 关于Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.此响应旨在有用并节省您的时间。它不是一个精确的解决方案,而是你自己研究的起点。

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.帮助我变得更有用!如果这有帮助并且👎无关紧要,请留下一个👍。

To continue the conversation, mention @dosu.要继续对话,请提及 。

在项目哪个地方更改代码? @dosu

Copy link

dosubot bot commented May 12, 2024

To address the issue of string lengths exceeding Milvus's maximum allowed length in your project, you should modify the do_add_doc method within the MilvusKBService class, located in your server/knowledge_base/kb_service/milvus_kb_service.py file. This method is where documents are prepared for insertion into the Milvus collection. By incorporating string truncation logic here, you ensure compliance with Milvus's string field length constraints before insertion.

Here's a concise example of how to integrate string truncation into the do_add_doc method:

def do_add_doc(self, docs: List[Document], **kwargs) -> List[Dict]:
    max_length = 65535  # Max length for string fields in Milvus
    for doc in docs:
        for k, v in doc.metadata.items():
            # Ensure string fields do not exceed max_length
            doc.metadata[k] = str(v)[:max_length]
        # Additional processing for document metadata before insertion
        # (e.g., setting default values, removing unnecessary fields)

    ids = self.milvus.add_documents(docs)
    doc_infos = [{"id": id, "metadata": doc.metadata} for id, doc in zip(ids, docs)]
    return doc_infos

This modification ensures that all string fields in your documents are truncated to the maximum length allowed by Milvus before they are inserted, effectively preventing the error you encountered.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant