-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Anyone else encounter Ollama Starter Example working once, then failing thereafter? #13188
Comments
It sounds like you're encountering a complex issue with the Ollama x LlamaIndex integration, where resource consumption and timeouts are a significant concern. Here are some targeted insights and potential solutions to address the problems you're facing:
Additionally, a similar issue related to kernel crashing or CUDA OutOfMemory Error when running embedding models locally suggests that local models cannot run asynchronously due to being compute-bound. Lowering the batch size, for example, using Understanding these principles and adjusting your setup accordingly could help mitigate the issues with the Ollama x LlamaIndex integration. If you continue to face challenges, providing more specific error messages or configurations could help in diagnosing and addressing the problem more effectively.
|
I've never had or really seen this issue 🤔 you could decrease the context window on the llm to try and reduce memory usage if that feels like the issue. LlamaIndex really tries to use the full context window always, to reduce llm calls |
Question Validation
Question
I've previously used a RAG system using Ollama x langchain, but decided to try Ollama x llama-index to see if I'd get better results. Started off with the Ollama Starter Example, got it to work once, and then thereafter it would always timeout (despite setting a 10+ min timeout).
More details: In these failures, I noticed that it wouldn't even progress to the embedding phase (where the progress bars load), and looking in task manager the
ollama_llama_server.exe
was consuming a ton of resources but nothing was happening until the timeout. When it timed out, the error message would mentionmap_httpcore_exceptions
(similar to the logs here).llama-index-llms-ollama
library; some say using the Ollama library works?Does any of the above give any clues that help diagnose what's going on? I'm not just looking for a fix, but more trying to understand the principles of why llama-index may not be working for me. And if there's some fundamental problem with Ollama x llama-index (that isn't a problem with Ollama x langchain), such that if I want to use llama-index I need to ditch Ollama for OpenAI?
Thanks in advance.
The text was updated successfully, but these errors were encountered: