API Server #47

DifferentialityDevelopment · 2024-05-11T22:49:58Z

This pull request introduces API functionality to the distributed llama project. The main addition is the implementation of the chat completion endpoint, following the specifications outlined by OpenAI for chat completions.

Key features of this implementation include streaming support, the capability to terminate generation upon detecting a stop word or end-of-sequence token (EOS), and the ability to dynamically adjust parameters such as temperature, seed, and top probability (top-p) for each request.

The code has undergone significant refactoring to enhance clarity and maintainability. I have tested the changes locally to ensure functionality. Your feedback on the implementation is highly appreciated.

server

src/server.cpp

Makefile

DifferentialityDevelopment · 2024-05-12T08:53:27Z

Noticed i accidentally renamed ProgramArgs to ServerArgs in main.cpp,
This wasn't intentional and will be reverted.

DifferentialityDevelopment · 2024-05-12T11:56:02Z

@b4rtaz
I've made the change to rather use SocketServer and I've added a function on a Socket that's meant just for reading an full http request, using read caused it to never finish since I didn't know ahead of time how much data is being sent.
I don't want to change the current read function as it's being used by the workers and such and don't want to break anything there.

b4rtaz · 2024-05-12T20:04:24Z

@DifferentialityDevelopment I need a bit of time to test it, after I'll release the refactored multihead layers I'll switch to this (maybe 2-5 days).

Wouter Tichelaar added 4 commits May 11, 2024 20:30

First draft, still cleaning up to do

db3997b

First draft of streaming chat completion

db9b294

Fixed formatting of data chunks

2dae6af

Refactored code a little

6656f4b

b4rtaz requested changes May 12, 2024

View reviewed changes

server Outdated Show resolved Hide resolved

src/server.cpp Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

Wouter Tichelaar added 3 commits May 12, 2024 11:54

Reverted main changes and removed server binary

2f44c40

Moved server to src/apps

90f6d01

Using SocketServer from socket.cpp instead in server.cpp

b01f57a

Fixed issue with reading http request from socket never completing

0614a85

DifferentialityDevelopment requested a review from b4rtaz May 12, 2024 13:56

Wouter Tichelaar and others added 9 commits May 13, 2024 00:16

Refactored a bit more code in server.cpp

11d05ea

Merge branch 'main' of https://github.com/b4rtaz/distributed-llama

3b1ccad

refactored code a bit more

653c524

Merge branch 'main' into pr/47

7bcd827

refactor.

4e8fefb

refactor.

6a960e9

stdexcept.

3a99f3a

refactor.

373e2db

stdexcept.

846af1e

b4rtaz merged commit 0590ece into b4rtaz:main May 19, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Server #47

API Server #47

DifferentialityDevelopment commented May 11, 2024

DifferentialityDevelopment commented May 12, 2024 •

edited

DifferentialityDevelopment commented May 12, 2024 •

edited

b4rtaz commented May 12, 2024 •

edited

API Server #47

API Server #47

Conversation

DifferentialityDevelopment commented May 11, 2024

DifferentialityDevelopment commented May 12, 2024 • edited

DifferentialityDevelopment commented May 12, 2024 • edited

b4rtaz commented May 12, 2024 • edited

DifferentialityDevelopment commented May 12, 2024 •

edited

DifferentialityDevelopment commented May 12, 2024 •

edited

b4rtaz commented May 12, 2024 •

edited