Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add OLLAMA_LOAD_TIMEOUT env variable #4123

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dcfidalgo
Copy link

Closes #3940

For certain hardware setups and models, the offloading to the GPU can take a lot of time and the user can hit a timeout. This PR makes the timeout configurable via the OLLAMA_LOAD_TIMEOUT env variable, to be provided in seconds.

@dhiltgen I added a subsection in the FAQ, since I was not sure where to document the env variable. Let me know if this is the right place.

llm/server.go Outdated Show resolved Hide resolved
llm/server.go Outdated Show resolved Hide resolved
}
}
print(timeout)
expiresAt := time.Now().Add(time.Duration(timeout) * time.Second) // be generous with timeout, large models can take a while to load
ticker := time.NewTicker(50 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print the message "loading the model" for each tick?
Without the message or a spinning, the compute seems being stuck for 10 mins.

@dcfidalgo dcfidalgo requested a review from dhiltgen May 6, 2024 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU offloading with little CPU RAM
3 participants