KnowledgeGraph building using KnowledgeGraphIndex is very slow #8433
Replies: 5 comments
-
The vector index calls embeddings during index construction (very fast, can batch calls) The knowledge graph index calls the LLM bu default (very slow, sequential) Hence, the speed difference Batching isn't always supported by LLMs, and even when it is, rate limits are also an issue. It would take a careful PR to fix this |
Beta Was this translation helpful? Give feedback.
-
🤖 Hi there, Thank you for reaching out and providing detailed information about your issue. To improve the performance of KnowledgeGraph building using KnowledgeGraphIndex, you might want to consider the following strategies:
Here is an example of how you might adjust these parameters when initializing the kg_index = KnowledgeGraphIndex(
service_context=service_context,
storage_context=storage_context,
max_triplets_per_chunk=5,
include_embeddings=False,
max_object_length=100,
kg_triplet_extract_fn=my_custom_extract_fn,
) Please note that while these changes can improve performance, they may also affect the quality of the results. You should therefore test these changes thoroughly to ensure they meet your needs. In the context shared, it's also important to note that not all cases will benefit from the use of a Knowledge Graph. For example, if the question does not contain small-grained pieces of knowledge in chunks, the extra Knowledge Graph retriever may not be helpful. Therefore, it's crucial to evaluate the nature of your data and the type of queries you're dealing with before deciding on the best indexing strategy. As for the specific performance differences between the KnowledgeGraphIndex and the LlamaIndex, I wasn't able to find an answer within the repository. The performance can vary greatly depending on the specific use case, data, and configuration. I hope this helps! If you have any other questions or need further clarification, please don't hesitate to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
🤖 Hello, Thank you for reaching out with your concern about the performance of KnowledgeGraph building using KnowledgeGraphIndex. To improve the performance of KnowledgeGraph building, you can consider adjusting some parameters in your KnowledgeGraphIndex configuration. Here are some suggestions:
Here is an example of how you might adjust these parameters when initializing the index = KnowledgeGraphIndex(
service_context=service_context,
storage_context=storage_context,
max_triplets_per_chunk=5,
include_embeddings=False,
max_object_length=64,
) Please note that while these changes can improve performance, they may also affect the quality of the results. You should therefore test these changes thoroughly to ensure they meet your needs. In terms of the specific performance differences between the KnowledgeGraphIndex and the LlamaIndex, it's important to note that the performance can be influenced by the complexity of the data and the relationships between entities. If the data contains many complex relationships, the process of extracting these relationships and building the KnowledgeGraph can be slower. Therefore, another strategy to improve performance could be to simplify the data or the relationships between entities, if possible. I hope this helps! If you have any other questions or need further clarification, please don't hesitate to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
I have the same issues here. I experience token time out because this KnowledgeGraphIndex is taking more than 2hrs. Can you recommend the file size limit if we are using llm for triplet extractions? Thank you |
Beta Was this translation helpful? Give feedback.
-
In fact, this process takes rather long. Is it possible to save check points for KnowledgeGraphIndex? Thank you once again! |
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
Hi Team,
We have observed very slow performance while building KG for email data (extracted in txt format) vs building Vector Index. We have tried to run this as K8 Jobs but no improvement in performance.
The current throughput to build KG using KnowledgeGraphIndex is 500 emails per hour.
Please note building Vector Index using llama index for the same set of email text data takes - 2500 mails per hour
(email is already deduplicated and curated - contains about 500 words in average. )
Below are the details of configuration -
llm - Azure Open AI (gpt-35-turbo for llm and text-embedding-ada-002 for embedding)
Graph DB - Nebula v3.6 installed on K8
llama-index==0.8.30
pymilvus==2.3.1
Please suggest if there is way to improve the performance of KG building
Beta Was this translation helpful? Give feedback.
All reactions