llm-inference

Star

Here are 390 public repositories matching this topic...

Climatik-Project / Climatik-Project

Star

Carbon Limiting Auto Tuning for Kubernetes

kubernetes sustainability kepler kubernetes-operator power-capping green-computing keda kserve llm vllm llm-inference

Updated May 18, 2024
Python

Neural-Dragon-AI / Cynde

Star

A Framework For Intelligence Farming

xgboost autoscaling pydantic openai-api polars llm-serving llm-inference modal-labs pydantic-logfire intelligence-farming

Updated May 18, 2024
Python

Adriankhl / godot-llm

Star

LLM in Godot

cpp godot llamacpp llm-inference

Updated May 18, 2024
C

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated May 18, 2024
Python

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验。

llm llmops llm-serving llm-training llm-inference

Updated May 18, 2024
HTML

google / jetstream-pytorch

Star

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated May 18, 2024
Python

OutofAi / ChitChat

Star

Modal LLM LLama.cpp based model deployment as part of series of Model as a Service (MaaS)

machine-learning serverless mistral modelasservice llm llamacpp llm-inference modeldeployment openhermes mistral-7b

Updated May 17, 2024
Python

Picovoice / picollm

Star

On-device LLM inference powered by x-bit quantization

Updated May 18, 2024
Python

microsoft / autogen

Star

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated May 17, 2024
Jupyter Notebook

lean-dojo / LeanCopilot

Star

LLMs as Copilots for Theorem Proving in Lean

machine-learning theorem-proving lean formal-mathematics lean4 llm-inference

Updated May 17, 2024
C++

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated May 17, 2024
C++

felladrin / MiniSearch

Star

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

search nlp search-engine machine-learning information-retrieval typescript ai artificial-intelligence webapp question-answering searxng llm gpu-accelerated generative-ai llm-inference retrieval-augmented-generation web-llm ratchet-ml wllama

Updated May 17, 2024
TypeScript

woheller69 / gpt4all-TK-CHAT

Star

A TK based graphical user interface for gpt4all. It uses the python bindings. Run LLMs in a very slimmer environment and leave maximum resources for inference

python ai gui-application gpt gpt4all llm-inference