Skip to content

ShelbyJenkins/llm_client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License

Table of Contents
  1. About The Project
  2. Getting Started
  3. Roadmap
  4. Contributing
  5. License
  6. Contact

A rust interface for the OpenAI API and Llama.cpp ./server API

  • A unified API for testing and integrating OpenAI and HuggingFace LLM models.
  • Load models from HuggingFace with just a URL.
  • Uses Llama.cpp server API rather than bindings, so as long as the Llama.cpp server API remains stable this project will remain usable.
  • Prebuilt agents - not chatbots - to unlock the true power of LLMs.

Easily switch between models and APIs

// Use an OpenAI model
let llm_definition = LlmDefinition::OpenAiLlm(OpenAiDef::Gpt35Turbo)
// Or use a model from hugging face
let llm_definition: LlmDefinition = LlmDefinition::LlamaLlm(LlamaDef::new(
    MISTRAL7BCHAT_MODEL_URL,
    LlamaPromptFormat::Mistral7BChat,
    Some(9001),  // Max tokens for model AKA context size
    Some(2),     // Number of threads to use for server
    Some(22),    // Layers to load to GPU. Dependent on VRAM
    Some(false), // This starts the llama.cpp server with embedding flag disabled
    Some(true),  // Logging enabled
));

let response = basic_text_gen::generate(
        &LlmDefinition::LlamaLlm(llm_definition),
        Some("Howdy!"),
    )
    .await?;
eprintln!(response)

Get deterministic responses from LLMs

if !boolean_classifier::classify(
        llm_definition,
        Some(hopefully_a_list),
        Some("Is the attached feature a list of content split into discrete entries?"),
    )
    .await?
    {
        panic!("{}, was not properly split into a list!", hopefully_a_list)
    }

Create embeddings*

let client_openai: ProviderClient =
    ProviderClient::new(&LlmDefinition::OpenAiLlm(OpenAiDef::EmbeddingAda002), None).await;

let _: Vec<Vec<f32>> = client_openai
    .generate_embeddings(
        &vec![
            "Hello, my dog is cute".to_string(),
            "Hello, my cat is cute".to_string(),
        ],
        Some(EmbeddingExceedsMaxTokensBehavior::Panic),
    )
    .await
    .unwrap();

  • Currently with limited support for llama.cpp

Start Llama.cpp via CLI

cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

$ llama server listening at http://localhost:8080

cargo run -p llm_client --bin server_runner stop

Download HF models via CLI

cargo run -p llm_client --bin model_loader_cli --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

Dependencies

async-openai is used to interact with the OpenAI API. A modifed version of the async-openai crate is used for the Llama.cpp server. If you just need an OpenAI API interface, I suggest using the async-openai crate.

Hugging Face's rust client is used for model downloads from the huggingface hub.

(back to top)

Getting Started

Step-by-step guide

  1. Clone repo:
git clone https://github.com/ShelbyJenkins/llm_client.git
cd llm_client
  1. Optional: Build devcontainer from llm_client/.devcontainer/devcontainer.json This will build out a dev container with nvidia dependencies installed.

  2. Add llama.cpp:

git submodule init 
git submodule update
  1. Build llama.cpp ( This is dependent on your hardware. Please see full instructions here):
// Example build for nvidia gpus
cd llm_client/src/providers/llama_cpp/llama_cpp
make LLAMA_CUDA=1
  1. Test llama.cpp ./server
cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

This will download and load the given model, and then start the server.

When you see llama server listening at http://localhost:8080, you can load the llama.cpp UI in your browser.

Stop the server with cargo run -p llm_client --bin server_runner stop.

  1. Using OpenAi: Add a .env file in the llm_client dir with the var OPENAI_API_KEY=<key>

Examples

Roadmap

  • Handle the various prompt formats of LLM models more gracefully
  • Unit tests
  • Add additional classifier agents:
    • many from many
    • one from many
  • Implement all openai functionality with llama.cpp
  • More external apis (claude/etc)

(back to top)

Contributing

This is my first Rust crate. All contributions or feedback is more than welcomed!

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Shelby Jenkins - Here or Linkedin

(back to top)

About

Rust library for integrating local LLMs (with llama.cpp) and external LLM APIs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages