Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with a locally hosted LLM instead of using API #5

Open
sarutobiumon opened this issue Jul 17, 2023 · 5 comments
Open

Integrate with a locally hosted LLM instead of using API #5

sarutobiumon opened this issue Jul 17, 2023 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@sarutobiumon
Copy link

Great work!

@AkshitIreddy
Copy link
Owner

AkshitIreddy commented Jul 17, 2023

Cohere's trial api key is free.

@AkshitIreddy
Copy link
Owner

AkshitIreddy commented Jul 17, 2023

the model needs to have atleast 4000 tokens as context length and should be able to run on consumer hardware and work in real-time, i don't think there are any models that can do that.

@JimiVex
Copy link

JimiVex commented Jul 31, 2023

Just to say, you can get 4000 tokens worth of context length when running models through exllama. I've been doing that with Chronos 30b model, with exllama in tow, with just enough room to still run stable diffusion on the side. Worth baring in mind though, that I've got a pretty beefy system, with a 4090 - so that would probably be a struggle for most. Plus, even on this PC, I probably couldn't run that alongside games that require particularly high specs. Although, any LLama model should be able to run 4000 tokens through exllama, so smaller models should be doable. All the same, I'd dig testing this out with a local LLM - would it be a struggle to alter the code to work with a local LLM through Ooga Booga?

@JimiVex
Copy link

JimiVex commented Jul 31, 2023

Here's a Reddit post chatting on the larger context length utilizing exllama: https://www.reddit.com/r/LocalLLaMA/comments/14j4l7h/6000_tokens_context_with_exllama/

The repo itself: https://github.com/turboderp/exllama

@AkshitIreddy
Copy link
Owner

oh cool, i haven't tried exllama or ooga booga before. The code is currently using langchain and cohere to give the responses, it should be possible to use a local llm, the prompts will more or less be the same, the parameters will need to modified a bit depending on the model. I'm not sure if i can add this feature though because i'm not using such a powerful machine 😅

@AkshitIreddy AkshitIreddy added good first issue Good for newcomers enhancement New feature or request help wanted Extra attention is needed labels Aug 28, 2023
@AkshitIreddy AkshitIreddy changed the title Would be nice to integrate with a locally hosted LLM instead of paying for an API Integrate with a locally hosted LLM instead of using API Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants