[Question]: Inference speedup #8444
Replies: 1 comment
-
🤖 Hello, To increase the inference speed of LlamaIndex, you can try the following methods:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device) You can also set the device or device_map on the pipeline. More information on this can be found in the Hugging Face documentation.
from llama_index import HnswParameters
hnsw_param = HnswParameters(
m=4,
ef_construction=500,
ef_search=1000,
metric="cosine",
)
from llama_index import SemanticSettings, SemanticConfiguration, PrioritizedFields
semantic_settings = SemanticSettings(
configurations=[
SemanticConfiguration(
name="default",
prioritized_fields=PrioritizedFields(
title_field=None,
prioritized_content_fields=[
SemanticField(field_name=self._field_mapping["chunk"])
],
),
)
]
)
if len(documents) >= 10:
logger.info(
f"Uploading batch of size {len(documents)}, "
f"current progress {len(ids)} of {len(nodes)}"
)
self._search_client.merge_or_upload_documents(documents)
documents = [] Please note that these settings should be adjusted carefully, as they can significantly impact the performance and accuracy of the search. It's recommended to perform thorough testing to find the optimal settings for your specific use case. Regarding the Xorbit inference, it's true that quantization can lead to a trade-off between speed and accuracy. If the accuracy drop is not acceptable for your use case, you might need to consider other methods to speed up the inference. I hope this helps! If you have any other questions, feel free to ask. Best, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Question Validation
Question
Is there any way to increase the inference using llamaindex?Tried Xorbit inference but it says accuracy is not great with llama2 7b model with 4 bit quantization.Please help out if there are alternative to increase the inference speed.
Beta Was this translation helpful? Give feedback.
All reactions