Skip to content

Rust library for integrating local LLMs (with llama.cpp) and external LLM APIs.

License

Notifications You must be signed in to change notification settings

ShelbyJenkins/llm_client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Contributors Forks Stargazers Issues MIT License

Structured text, decision making, and benchmarks. A user friendly interface to write once and run on any local or API model.

LLMs aren't chat bots; They're information arbitrage machines and prompts are database queries.

  • Structure the outputs of generated text, make decisions from novel inputs, and classify data.
  • The easiest interface possible to deploy and test the same logic to various LLM backends.
  • A local first and embedded model. Meant to be built and ran in-process with your business logic. No stand alone servers.

LLMs as decision makers 🚦

  • What previously took dozens, hundreds, or thousands of if statements for a specific requirement, can now be done with a few lines of code across novel inputs.

  • llm_client uses what might be a novel process for LLM decision making. First, we get the LLM to 'justify' an answer in plain english. This allows the LLM to 'think' by outputting the stream of tokens required to come to an answer. Then we take that 'justification', and prompt the LLM to parse it for the answer. Then we do it again N times where is is best_of_n_votes and we dynamically alter the temperature to ensure an accurate consensus.

    let res: bool = llm_client.decider().boolean()
        .system_content("Does this email subject indicate that the email is spam?")
        .user_content("You'll never believe these low, low prices πŸ’²πŸ’²πŸ’²!!!")
        .run().await?;
    assert_eq!(res, true);

    let res: u16 = llm_client.decider().integer()
        .system_content("How many times is the word 'llm' mentioned in these comments?")
        .user_content(hacker_news_comment_section)
        .run().await?;
    assert!(res > 1);

    let res: String = llm_client.decider().custom()
        .system_content("Based on this resume, what is the users first name?")
        .user_content(shelby_resume)
        .add_choice("shelby")
        .add_choice("jack")
        .add_choice("camacho")
        .add_choice("john")
        .run().await?;
    assert!(res != "shelby");

Structured text πŸ“

  • 'Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.' Using Regex to parse and structure the output of LLMs puts an exponent over this old joke.

  • llm_client implements structuring text through logit_bias and grammars. Of the two, grammars is the most powerful, and allows for very granular controls of text generation. Logit bias, through having wider support, is less useful as it relies on adjusting the probabilities of individual tokens.

    let res: Vec<String> = llm_client.text().grammar_list()
        .system_content("ELI5 each topic in this text.")
        .user_content(wikipedia_article)
        .max_items(5)
        .min_items(3)
        .run().await?;
    assert_eq!(res.len() > 3);

    let res: String = llm_client.text().grammar_text()
        .system_content("Summarize this mathematical funtion in plain english. Do not use notation.")
        .user_content(wikipedia_article)
        .restrict_extended_punctuation()
        .run().await?;
    assert!(!res.contains('('));
    assert!(!res.contains('['));

    let res: String = llm_client.text().logit_bias_text()
        .system_content("Summarize this article")
        .user_content(wikipedia_article)
        .add_logit_bias_from_word("delve", -100.0);
        .run().await?;
    assert!(!res.contains("delve"));

LLM -> LLMs 🀹

  • The same code across multiple LLMs.

  • This makes benchmarking multiple LLMs really easy. Checkout src/bechmark for an example.

    pub async fn chatbot(llm_client: &LlmClient, user_input: &str) -> Result<String> {
        llm_client.text().basic_text()
            .system_content("You're a kind robot.")
            .user_content(user_input)
            .temperature(0.5)
            .max_tokens(2)
            .run().await
    }

    let llm_client = LlmClient::llama_backend()
        .mistral_7b_instruct()
        .init()
        .await?;
    assert_eq!(chatbot(&llm_client, "What is the meaning of life?").await?, "42")

    let llm_client = LlmClient::llama_backend()
        .model_url("https://huggingface.co/your_cool_model_Q5_K.gguf")
        .init()
        .await?;
    assert_eq!(chatbot(&llm_client, "What is the meaning of life?").await?, "42")

    let llm_client = LlmClient::openai_backend().gpt_4_o().init()?;
    assert_eq!(chatbot(&llm_client, "What is the meaning of life?").await?, "42")

    let llm_client = LlmClient::anthropic_backend().claude_3_opus().init()?;
    assert_eq!(chatbot(&llm_client, "What is the meaning of life?").await?, "42")

Minimal Example

use llm_client::LlmClient;

// Setting available_vram will load the largest quantized model that can fit the given vram.
let llm_client = LlmClient::llama_backend().available_vram(16).llama_3_8b_instruct().init().await?;

let res = llm_client.text().basic_text().user_content("Hello world?").run().await?;

assert_eq!(res, "Hello world!");

Examples

Guides

Installation

llm_client currently relies on llama.cpp. As it's a c++ project, it's not bundled in the crate. In the near future, llm_client will support mistral-rs, an inference backend built in Candle and supporting great features like ISQ. Once integration is complete, llm_client will be pure Rust and can be installed as just a crate.

If only using OpenAi and/or Anthropic

  • Add to cargo.toml:
[dependencies]
llm_client = "*"
  • Add API key
    • Add OPENAI_API_KEY=<key> and/or ANTHROPIC_API_KEY=<key> to your .env file
    • Or use the api_key function in the backend builder functions

If using Llama.cpp and/or external APIs

  • Clone repo:
git clone --recursive https://github.com/ShelbyJenkins/llm_client.git
cd llm_client
  • Add to cargo.toml:
[dependencies]
llm_client = {path="../llm_client"}

Roadmap

  • Migrate from llama.cpp to mistral-rs. This would greatly simplify consuming as an embedded crate. It's currently a WIP. It may also end up that llama.cpp is behind a feature flag as a fallback.
  • Additional deciders: Multiple reponse deciders.
  • Classifer, summarizer, map reduce agents.
  • Extend grammar support: Custom grammars, JSON support.
  • More external APIs such as Google, AWS, Groq, and LLM aggregators and routers.
  • Dream roadmap item: web ui for streaming output of multiple LLMs for a single prompt. Because we already do this with Claude and ChatGPT anyways don't we?

Dependencies

async-openai is used to interact with the OpenAI API. A modifed version of the async-openai crate is used for the Llama.cpp server. If you just need an OpenAI API interface, I suggest using the async-openai crate.

clust is used to interact with the Anthropic API. If you just need an Anthropic API interface, I suggest using the clust crate.

llm_utils is a sibling crate that was split from the llm_client. If you just need prompting, tokenization, model loading, etc, I suggest using the llm_utils crate on it's own.

Contributing

This is my first Rust crate. All contributions or feedback is more than welcomed!

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Shelby Jenkins - Here or Linkedin

About

Rust library for integrating local LLMs (with llama.cpp) and external LLM APIs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages