Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /api/infill for fill-in-the-middle #3907

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PhilKes
Copy link

@PhilKes PhilKes commented Apr 25, 2024

This PR closes #3869

Adds /api/infill to leverage llama.cpp's POST /infill API for infilling / fill-in-the-middle / code-completion.

An example request looks like this:

POST /api/infill HTTP/1.1
Host: localhost:11434
Content-Type: application/json
Content-Length: 199
{
    "stream": false,
    "model": "codellama:7b-instruct-q3_K_M",
    "input_prefix": "public int gcd(int x, int y) {",
    "input_suffix": "\n}",
    "options": {
        "num_predict": 10
    }
}

Response:

{
    "model": "codellama:7b-instruct-q3_K_M",
    "created_at": "2024-04-25T11:09:06.691622744Z",
    "response": "\n    return y == 0 ? x :",
    "done": true,
    "total_duration": 3385921466,
    "load_duration": 948913941,
    "prompt_eval_count": 18,
    "prompt_eval_duration": 1175793000,
    "eval_count": 10,
    "eval_duration": 1219642000
}

(Streaming is also available)

Note: could probably need more refactoring to cleanup duplicate code, I am very new to Golang programming, so feedback would be appreciated

@@ -83,10 +83,10 @@ type GenerateRequest struct {
type InfillRequest struct {

// InputPrefix is the text before the infilling position to send to the model.
InputPrefix string `json:"input_prefix"`
InputPrefix string `json:"input_prefix"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not merge with the previous CL?

Copy link
Author

@PhilKes PhilKes Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now squashed the 2 commits, I hope thats what you meant.

@NightMachinery
Copy link

Where is FIM actually implemented for each model?

@PhilKes PhilKes force-pushed the infill-api branch 2 times, most recently from 39c65f3 to 4aaa4f8 Compare April 26, 2024 08:31
@PhilKes
Copy link
Author

PhilKes commented Apr 26, 2024

Where is FIM actually implemented for each model?

The actual infilling / FIM prompt is already implemented in llama.cpp in ggerganov/llama.cpp#3296, so we can simply leverage the /infill API.
Obviously FIM is not available for all models, but e.g. Codellama has been trained with the appropriate prompts/FIM-Tokens.
llama.cpp looks for the FIM-Tokens in the model's vocab (see e.g. tokenizer.json for CodeLlama).

It might be a good idea to check if the used model actually supports FIM, and if not immediately return an error:

// TODO: check if model supports FIM/Infill -> Add SupportFIM to type ConfigV2?

EDIT: I just stumbled upon ggerganov/llama.cpp#6689, which enables other FIM trained models to be used with llama.cpp than just CodeLlama

@NightMachinery
Copy link

@PhilKes It'd be good if we could list all supported models, as well.

How does llama.cpp know the FIM prompt template? Is this info contained in the GGUF files? Or does it just assume the template FIM_START prefix FIM_SUFFIX suffix FIM_COMPLETE?

@PhilKes
Copy link
Author

PhilKes commented Apr 26, 2024

@PhilKes It'd be good if we could list all supported models, as well.

How does llama.cpp know the FIM prompt template? Is this info contained in the GGUF files? Or does it just assume the template FIM_START prefix FIM_SUFFIX suffix FIM_COMPLETE?

The prompt for infilling is created at:
https://github.com/ggerganov/llama.cpp/blob/e2764cd7ca1112d9303eba9e81c9935ee67352ff/examples/server/server.cpp#L1920-L1943
I believe all models supporting FIM use this kind of prompt template, just with different FIM tokens

@PhilKes PhilKes requested a review from bsdnet April 29, 2024 15:39
@PhilKes
Copy link
Author

PhilKes commented May 3, 2024

Would love to hear some opinions on this @jmorganca 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API for FIM tasks
3 participants