chat-flame-backend

ChatFlameBackend is an innovative backend solution for chat applications, leveraging the power of the Candle AI framework with a focus on the Mistral model

Quickstart

Installation

cargo build --release

Running

Run the server

cargo run --release

Run one of the models

cargo run --release -- --model phi-v2 --prompt 'write me fibonacci in rust'

Docker

docker-compose up --build

Visit http://localhost:8080/swagger-ui for the swagger ui.

Testing

Test using the shell

cargo test

or with curl

curl -X POST http://localhost:8080/generate \
     -H "Content-Type: application/json" \
     -d '{"inputs": "Your text prompt here"}'

or the stream endpoint

curl -X POST -H "Content-Type: application/json" -d '{"inputs": "Your input text"}' http://localhost:8080/generate_stream

Test using python

You can find a detailed documentation on how to use the python client on huggingface.

virtualenv .venv
source .venv/bin/activate
pip install huggingface-hub
python test.py

Architecture

The backend is written in rust. The models are loaded using the candle framework. To serve the models on an http endpoint, axum is used. Utoipa is used to provide a swagger ui for the api.

Supported Models

Mistral

"lmz/candle-mistral"

Phi

"microsoft/phi-2"

Performance

The following table shows the performance metrics of the model on different systems:

Model	System	Tokens per Second
7b-open-chat-3.5	AMD 7900X3D (12 Core) 64GB	9.4 tokens/s
7b-open-chat-3.5	AMD 5600G (8 Core VM) 16GB	2.8 tokens/s
13b (llama2 13b)	AMD 7900X3D (12 Core) 64GB	5.2 tokens/s
phi-2	AMD 7900X3D (12 Core) 64GB	20.6 tokens/s
phi-2	AMD 5600G (8 Core VM) 16GB	5.3 tokens/s
phi-2	Apple M2 (10 Core) 16GB	24.0 tokens/s

Hint

The performance of the model is highly dependent on the memory bandwidth of the system. While getting 20.6 tokens/s for the Phi-2 Model on a AMD 7900X3D with 64GB of DDR5-4800 memory, the performance could be increased to 21.8 tokens/s by overclocking the memory to DDR5-5600.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.cargo		.cargo
.github		.github
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
chat-ui.env		chat-ui.env
config.yml		config.yml
docker-compose.yml		docker-compose.yml
test.py		test.py

License

chriamue/chat-flame-backend

Folders and files

Latest commit

History

Repository files navigation

chat-flame-backend

Quickstart

Installation

Running

Docker

Testing

Test using the shell

Test using python

Architecture

Supported Models

Mistral

Phi

Performance

Hint

Todo

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages