#

serving

Here are 104 public repositories matching this topic...

ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Jun 4, 2024
Python

deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated Jun 4, 2024
Java

vespa

vespa-engine / vespa

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated Jun 4, 2024
Java

OSS-Pole-Emploi / happy_vllm

A REST API for vLLM, production ready

production transformers api-rest serving mlops llm llm-serving vllm

Updated Jun 4, 2024
Python

Lightning-AI / LitServe

Deploy AI models at scale. High-throughput serving engine for AI/ML models that uses the latest state-of-the-art model deployment techniques.

Updated Jun 4, 2024
Python

torchpipe / torchpipe

An Alternative for Triton Inference Server. Boosting DL Service Throughput 1.5-4x by Ensemble Pipeline Serving with Concurrent CUDA Streams for PyTorch/LibTorch Frontend and TensorRT/CVCUDA, etc., Backends

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server ray-serve cvcuda

Updated Jun 4, 2024
C++

openvinotoolkit / model_server

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Jun 4, 2024
C++

dingodb / dingo

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

structured-data serving unstructured-data unified-sql vector-database mysql-compatibility embedding-search embedding-store key-value-distributed-store vector-ocean real-time-semantic-search

Updated Jun 4, 2024
Java

pytorch / serve

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Jun 3, 2024
Java

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 4, 2024
C++

intel / intel-ai-inference-samples

Intel® AI Inference Samples provide example code for deploying optimized inference in Intel platforms.

sample ai intel inference bert serving ipex openvino

Updated Jun 3, 2024
Python

SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

kubernetes machine-learning deployment serving aiops production-machine-learning mlops machine-learning-operations

Updated Jun 4, 2024
HTML

tensorflow / serving

A flexible, high-performance serving system for machine learning models

python machine-learning deep-neural-networks deep-learning neural-network cpp tensorflow ml serving

Updated Jun 1, 2024
C++

ray-project / ray-llm

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

polyaxon / haupt

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Updated May 28, 2024
Python

friendliai / friendli-client

Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated May 25, 2024
Python

torchpipe.github.io

torchpipe / torchpipe.github.io

Docs for torchpipe: https://github.com/torchpipe/torchpipe

deployment inference pytorch tensorrt serving pipeline-parallelism

Updated May 22, 2024
MDX

ahkarami / Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

Updated May 18, 2024

evadb

georgia-tech-db / evadb

Database system for AI-powered apps

agent database ai data-analysis eva object-detection labeling hacktoberfest video-analytics serving huggingface gpt-4 llm chatgpt langchain gpt4all auto-gpt

Updated May 17, 2024
Python

PaddlePaddle / Serving

A flexible, high-performance carrier for machine learning models（『飞桨』服务化部署框架）

python docker deep-learning pipeline gpu prediction micro-service rpc-service dag paddle microservice-toolkit predictor serving online-service paddle-serving

Updated May 6, 2024
C++

Improve this page

Add a description, image, and links to the serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the serving topic, visit your repo's landing page and select "manage topics."