Add /api/infill for fill-in-the-middle #3907

PhilKes · 2024-04-25T11:13:52Z

This PR closes #3869

Adds /api/infill to leverage llama.cpp's POST /infill API for infilling / fill-in-the-middle / code-completion.

An example request looks like this:

POST /api/infill HTTP/1.1
Host: localhost:11434
Content-Type: application/json
Content-Length: 199
{
    "stream": false,
    "model": "codellama:7b-instruct-q3_K_M",
    "input_prefix": "public int gcd(int x, int y) {",
    "input_suffix": "\n}",
    "options": {
        "num_predict": 10
    }
}

Response:

{
    "model": "codellama:7b-instruct-q3_K_M",
    "created_at": "2024-04-25T11:09:06.691622744Z",
    "response": "\n    return y == 0 ? x :",
    "done": true,
    "total_duration": 3385921466,
    "load_duration": 948913941,
    "prompt_eval_count": 18,
    "prompt_eval_duration": 1175793000,
    "eval_count": 10,
    "eval_duration": 1219642000
}

(Streaming is also available)

Note: could probably need more refactoring to cleanup duplicate code, I am very new to Golang programming, so feedback would be appreciated

bsdnet · 2024-04-25T17:27:52Z

api/types.go

@@ -83,10 +83,10 @@ type GenerateRequest struct {
 type InfillRequest struct {

 	// InputPrefix is the text before the infilling position to send to the model.
-  InputPrefix string `json:"input_prefix"`
+	InputPrefix string `json:"input_prefix"`


why not merge with the previous CL?

I now squashed the 2 commits, I hope thats what you meant.

NightMachinery · 2024-04-25T23:42:18Z

Where is FIM actually implemented for each model?

PhilKes · 2024-04-26T08:42:05Z

Where is FIM actually implemented for each model?

The actual infilling / FIM prompt is already implemented in llama.cpp in ggerganov/llama.cpp#3296, so we can simply leverage the /infill API.
Obviously FIM is not available for all models, but e.g. Codellama has been trained with the appropriate prompts/FIM-Tokens.
llama.cpp looks for the FIM-Tokens in the model's vocab (see e.g. tokenizer.json for CodeLlama).

It might be a good idea to check if the used model actually supports FIM, and if not immediately return an error:

ollama/server/routes.go

Line 353 in 4aaa4f8

// TODO: check if model supports FIM/Infill -> Add SupportFIM to type ConfigV2?

EDIT: I just stumbled upon ggerganov/llama.cpp#6689, which enables other FIM trained models to be used with llama.cpp than just CodeLlama

NightMachinery · 2024-04-26T15:52:20Z

@PhilKes It'd be good if we could list all supported models, as well.

How does llama.cpp know the FIM prompt template? Is this info contained in the GGUF files? Or does it just assume the template FIM_START prefix FIM_SUFFIX suffix FIM_COMPLETE?

PhilKes · 2024-04-26T16:20:58Z

@PhilKes It'd be good if we could list all supported models, as well.

How does llama.cpp know the FIM prompt template? Is this info contained in the GGUF files? Or does it just assume the template FIM_START prefix FIM_SUFFIX suffix FIM_COMPLETE?

The prompt for infilling is created at:
https://github.com/ggerganov/llama.cpp/blob/e2764cd7ca1112d9303eba9e81c9935ee67352ff/examples/server/server.cpp#L1920-L1943
I believe all models supporting FIM use this kind of prompt template, just with different FIM tokens

PhilKes · 2024-05-03T16:18:14Z

Would love to hear some opinions on this @jmorganca 😀

PhilKes mentioned this pull request Apr 25, 2024

Implement Ollama as a high-level service carlrobertoh/CodeGPT#510

Merged

4 tasks

bsdnet reviewed Apr 25, 2024

View reviewed changes

PhilKes force-pushed the infill-api branch 2 times, most recently from 39c65f3 to 4aaa4f8 Compare April 26, 2024 08:31

PhilKes requested a review from bsdnet April 29, 2024 15:39

Add POST /api/infill for fill-in-the-middle

a2b1f49

PhilKes force-pushed the infill-api branch from 34ec1d5 to a2b1f49 Compare May 15, 2024 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /api/infill for fill-in-the-middle #3907

Add /api/infill for fill-in-the-middle #3907

PhilKes commented Apr 25, 2024 •

edited

bsdnet Apr 25, 2024

PhilKes Apr 26, 2024 •

edited

NightMachinery commented Apr 25, 2024

PhilKes commented Apr 26, 2024 •

edited

NightMachinery commented Apr 26, 2024

PhilKes commented Apr 26, 2024

PhilKes commented May 3, 2024

Add /api/infill for fill-in-the-middle #3907

Are you sure you want to change the base?

Add /api/infill for fill-in-the-middle #3907

Conversation

PhilKes commented Apr 25, 2024 • edited

bsdnet Apr 25, 2024

Choose a reason for hiding this comment

PhilKes Apr 26, 2024 • edited

Choose a reason for hiding this comment

NightMachinery commented Apr 25, 2024

PhilKes commented Apr 26, 2024 • edited

NightMachinery commented Apr 26, 2024

PhilKes commented Apr 26, 2024

PhilKes commented May 3, 2024

PhilKes commented Apr 25, 2024 •

edited

PhilKes Apr 26, 2024 •

edited

PhilKes commented Apr 26, 2024 •

edited