You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NodeParser.get_nodes_from_documents assigns previous and next relationships to nodes without considering whether a given node is at the beginning and/or end of the source document; in which case, a node can have a previous and/or next relationship to a node from another source document.
The way it's written assumes that the documents being passed into NodeParser.get_nodes_from_documents are related and are in some meaningful order, which does not seem to be the intuitive use case.
Check that previous and next nodes share the same source node before creating the relationship. I would like to implement the change myself if that's ok.
Version
0.10.31
Steps to Reproduce
This basic example demonstrates the issue.
fromllama_index.coreimportDocumentfromllama_index.core.node_parserimportTokenTextSplitterdocs= [Document(text=f'{i} the quick brown fox jumped over the lazy dog {i}') foriinrange(10)]
id_to_docs= {d.id_: dfordindocs}
parser=TokenTextSplitter(
chunk_size=10,
chunk_overlap=0,
separator=" ",
)
nodes=parser.get_nodes_from_documents(docs)
id_to_node= {n.id_: nforninnodes}
test_node=nodes[1]
print('node text:',test_node.text)
print('node source node id:', test_node.source_node.node_id)
next_node=id_to_node[test_node.next_node.node_id]
print('next node:', next_node.text)
print('next node source node id:', next_node.source_node.node_id)
Output:
node text: lazy dog 0
node source node id: ec209db0-109f-49c2-8e34-ea4833c228e2
next node: 1 the quick brown fox jumped over the
next node source node id: 897ca3fd-c74a-44e6-af94-44ba47b3f79f
Note that the node and the nodes' next node do not share the same source node.
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered:
Hey there, @alineberry! 👋 I'm Dosu, a bot here to lend a hand with bugs, answer your questions, or help you dive into contributing while we wait for a human maintainer to swing by. Excited to assist you today! I'll take a closer look at the issue you've reported and get back to you shortly with some insights. Stay tuned!
Great! If you need any guidance or have questions as you work on your PR, feel free to reach out. Happy coding!
Details
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Bug Description
NodeParser.get_nodes_from_documents
assigns previous and next relationships to nodes without considering whether a given node is at the beginning and/or end of the source document; in which case, a node can have a previous and/or next relationship to a node from another source document.The way it's written assumes that the documents being passed into
NodeParser.get_nodes_from_documents
are related and are in some meaningful order, which does not seem to be the intuitive use case.Code source
Proposed Solution:
Check that previous and next nodes share the same source node before creating the relationship. I would like to implement the change myself if that's ok.
Version
0.10.31
Steps to Reproduce
This basic example demonstrates the issue.
Output:
Note that the node and the nodes' next node do not share the same source node.
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: