Add probability distribution to choices #479

andy-zhou · 2023-12-24T05:46:27Z

Presentation of the new feature

It would be very helpful to include the probability distribution of the different options (both log probabilities and real probabilities) present in outlines.generate.choice(). This is useful for evaluating the certainty of the model for any given classification.

We use it as a pre-filter step for deciding if we should generate more expensive reasoning (for example, CoT) to arrive at a more certain classification.

Two areas of complexity that I'm aware of:

If you ever implement token-healing, the probabilities need to include the healed tokens
OpenAI doesn't include logits in their non-deprecated models

Are you willing to open a PR?

Yes, though I would need pointers on where to start.

The text was updated successfully, but these errors were encountered:

wdhitchc · 2023-12-27T04:26:33Z

Greate Idea. Here are some quick thoughts on how we might be able to implement this although I'm not completely sure if this would work..

On in line 75 of serve.py we have:

   text_outputs = [prompt + output.text for output in request_output.outputs]

https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py

Request_output.output should be a VLLM CompletionOutput?, which has log probs as an argument. If that was the case you could just add an optionl to return that as well.

https://github.com/vllm-project/vllm/blob/main/vllm/outputs.py

rlouf · 2023-12-27T07:33:29Z

There's a small subtlety here. There may be several combinations of tokens that lead to either of the choices. In this case do we need to return the logprob that corresponds to all possible paths or only the path that was sampled?

If we go with "all possible paths", the basic idea is to find all paths in the FSM that lead to either choice and pass the corresponding token ids + prompt through the model to get the corresponding logprobs.

HerrIvan · 2024-01-08T14:55:27Z

Maybe off-topic: What about doings this for the openai part. For that we should give (optional) access to the logprobs values (https://cookbook.openai.com/examples/using_logprobs).

Of course, there is the subtlety that a generation may be the output of "n" api-calls, so there may be some decisions to be made on how to returns the aggregates.

A pre-condition for this may be using the pickleserializer for the persistence, so that we can save the whole api response. But that change would affect also the "transformers" part of the module.

(Should this go to a separate issue?)

rlouf · 2024-01-18T22:33:22Z

I am still not sure what the API would look like, especially since we still want outlines.generate.choices to return one of the choices.

dnhkng · 2024-01-26T17:32:42Z

Can we have a new method, "probabilities"?

Also, can you point out where in the codebase the actual decision on which class to select is made?

lapp0 · 2024-01-26T18:20:56Z

We might consider returning a GenerationResult object. e.g.

>>> result = outlines.generate.choice(model, ["Positive", "Negative"], sampler=BeamSampler(2))`
>>> result.text
"Positive"
>>> result.relative_probs  # probability relative to actual generations
{"Positive": 0.6, "Negative": 0.4}
>>> result.absolute_probs  # un-normalized probabilities, valuable other `outlines.generate` functions
{"Positive": 0.3, "Negative": 0.2}

Note that if choices have multiple tokens, we aren't guaranteed we know the probabilities. However, with a beam search sampler we can guarantee we know the relative_probs of N samples. For generate.choice, N would have to equal the number of choices, since we want P("Choice"∣choices) as opposed to P("Choice"∣entire set of possible generations).

@dnhkng you might run into issues implementing this before beam search is available. The actual decision on which class to select is determined by the language model, not based on post-processing. https://github.com/outlines-dev/outlines/blob/main/outlines/generate/choice.py

dnhkng · 2024-01-26T19:06:01Z

Ahh, ok. I thought the selection was done by post-processing the probabilities. Otherwise, you might select categories with high initial token probability, but with a beam search you would find the overall most likely category.

I have an interesting use case that would require the probabilities.

lapp0 · 2024-01-26T20:15:05Z

With beam search, you are guaranteed to explore all legal paths given that the number of legal paths is equal to the number of beams. This is why I suggest beam search.

Although, reconsidering - there may be multiple legal paths for each choices, e.g. ["Pos", "itive"] and ["Posit", "ive"]. We would need to guarantee all choices are generated through other means.

I agree that this is a valuable and interesting use case. Here are a few steps that would need to be done to accomplish this:

1. Ensure SequenceGenerator can return logits
1. Create outlines.generate.logits which returns the result and logits of a prompt and uses Greedy sampler by default.
1. Create outlines.generate.probabilities which calls outlines.generate.logits with each choice.

rlouf · 2024-01-26T20:40:19Z

With beam search, you are guaranteed to explore all legal paths given that the number of legal paths is equal to the number of beams. This is why I suggest beam search.

This is overkill

Although, reconsidering - there may be multiple legal paths for each choices, e.g. ["Pos", "itive"] and ["Posit", "ive"]. We would need to guarantee all choices are generated through other means.

You can walk the FSM created when calling RegexFSM, get all the possible token combinations, run them in one batch through the model and sum the path probabilities.

dnhkng · 2024-01-26T21:21:36Z

Sum of the average probability per token of each combination?

Some care needs to be taken with the target categories. Imagine a character level LLM, and we want the probabilities of 'yes' or 'no' for some prompt question. Not only are there more letters in 'yes', but there are also many more words that start with 'no', biasing the selection.

In this case, although we want just 'yes' or 'no' we should use something like 'yes.' or 'yes ', as the probability on the ' ' or '.' will compensate the letters when we average over all characters.

lapp0 · 2024-01-26T22:09:44Z

You can walk the FSM created when calling RegexFSM, get all the possible token combinations, run them in one batch through the model and sum the path probabilities.

I'm concerned about the number of combinations of tokens, it would have exploding growth. Is there something I'm missing here?

>>> generate_substring_combinations("foo")
[['f', 'o', 'o'], ['f', 'oo'], ['fo', 'o'], ['foo']]
>>> generate_substring_combinations("foo1")
[['f', 'o', 'o', '1'], ['f', 'o', 'o1'], ['f', 'oo', '1'], ['f', 'oo1'], ['fo', 'o', '1'], ['fo', 'o1'], ['foo', '1'], ['foo1']]
>>> generate_substring_combinations("foobar")
[['f', 'o', 'o', 'b', 'a', 'r'], ['f', 'o', 'o', 'b', 'ar'], ['f', 'o', 'o', 'ba', 'r'], ['f', 'o', 'o', 'bar'], ['f', 'o', 'ob', 'a', 'r'], ['f', 'o', 'ob', 'ar'], ['f', 'o', 'oba', 'r'], ['f', 'o', 'obar'], ['f', 'oo', 'b', 'a', 'r'], ['f', 'oo', 'b', 'ar'], ['f', 'oo', 'ba', 'r'], ['f', 'oo', 'bar'], ['f', 'oob', 'a', 'r'], ['f', 'oob', 'ar'], ['f', 'ooba', 'r'], ['f', 'oobar'], ['fo', 'o', 'b', 'a', 'r'], ['fo', 'o', 'b', 'ar'], ['fo', 'o', 'ba', 'r'], ['fo', 'o', 'bar'], ['fo', 'ob', 'a', 'r'], ['fo', 'ob', 'ar'], ['fo', 'oba', 'r'], ['fo', 'obar'], ['foo', 'b', 'a', 'r'], ['foo', 'b', 'ar'], ['foo', 'ba', 'r'], ['foo', 'bar'], ['foob', 'a', 'r'], ['foob', 'ar'], ['fooba', 'r']]

>>> choices = ("She is at home", "She is at the store")
>>> len(generate_substring_combinations(choices[0]))
6930
>>> len(generate_substring_combinations(choices[1]))
203513

I don't think we can explore all tokenization paths for a given choice. It seems the best we can do is calculate the probability the best path for each choice (via greedy for now, beam later) and compare, OR strictly limit the size of probabilistic choices.

dnhkng · 2024-01-27T08:21:31Z

Although the number of combinations feels n^2, I think the paths overlap, and it resolves to n. Feels like a dynamic programming coding interview question 😅

Break down the input string into subchunks recursively, and then do a batch on an LLM to get the logits, and fill in the graph. Finally, calculate all the paths based on the probabilities, calculate the average probability per token per path, and sum them?

rlouf · 2024-01-27T08:30:37Z

The simplest here would still be approximate by taking multiple samples once #533 is merged. SMC on the roadmap should give better results.

dnhkng · 2024-01-27T08:32:58Z

Yes, monte carlo might be fine ;)

BTW, can someone tell me what FSM stands for? Finite state machine maybe?

rlouf · 2024-01-27T09:10:01Z

Finite State Machine indeed.

LouisHernandez17 · 2024-05-13T12:50:23Z

Is anyone still actively working on this ? @dnhkng ? If not, I can give it a try myself, I also need it.

dnhkng · 2024-05-13T13:43:42Z

No, not working on this feature.

aaronsnoswell · 2024-05-15T23:20:43Z

+1 for this feature - this would be very useful!

LouisHernandez17 · 2024-05-16T15:42:51Z

For BeamSearch Sampler, do you think it would be a satisfying approximation to consider the weights returned by the sampler as the log probabilities?

By default, BeamSearch with choice already returns one prediction per beam, ordered by beam weight. We can then easily get the probability by applying an exp, and, finally, group the beam prediction by final output, and sum the probabilities.

Pros:

Almost no extra computation, as we use the weights already computed by beam-search
Actually shows what's going on when sampling, since the probabilities are the same as the ones used to filter and sort the topk.

Cons:

Only works for BeamSearch sampler
The probabilities don't always sum to one, in particular when many allowed path are thrown away by topk.

I implemented this in a PR I just submitted (#895)

brandonwillard added enhancement structured generation Linked to structured generation labels Dec 24, 2023

rlouf added the help wanted label Jan 5, 2024

rlouf mentioned this issue Jan 26, 2024

Output probabilities #591

Closed

LouisHernandez17 linked a pull request May 16, 2024 that will close this issue

Added generate.probabilities for `BeamSearch #895

Open

brandonwillard linked a pull request May 16, 2024 that will close this issue

Added generate.probabilities for `BeamSearch #895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add probability distribution to choices #479

Add probability distribution to choices #479

andy-zhou commented Dec 24, 2023

wdhitchc commented Dec 27, 2023

rlouf commented Dec 27, 2023 •

edited

HerrIvan commented Jan 8, 2024

rlouf commented Jan 18, 2024

dnhkng commented Jan 26, 2024

lapp0 commented Jan 26, 2024 •

edited

dnhkng commented Jan 26, 2024

lapp0 commented Jan 26, 2024 •

edited

rlouf commented Jan 26, 2024

dnhkng commented Jan 26, 2024 •

edited

lapp0 commented Jan 26, 2024 •

edited

dnhkng commented Jan 27, 2024 •

edited

rlouf commented Jan 27, 2024

dnhkng commented Jan 27, 2024

rlouf commented Jan 27, 2024

LouisHernandez17 commented May 13, 2024

dnhkng commented May 13, 2024

aaronsnoswell commented May 15, 2024 •

edited

LouisHernandez17 commented May 16, 2024

Add probability distribution to choices #479

Add probability distribution to choices #479

Comments

andy-zhou commented Dec 24, 2023

Presentation of the new feature

Are you willing to open a PR?

wdhitchc commented Dec 27, 2023

rlouf commented Dec 27, 2023 • edited

HerrIvan commented Jan 8, 2024

rlouf commented Jan 18, 2024

dnhkng commented Jan 26, 2024

lapp0 commented Jan 26, 2024 • edited

dnhkng commented Jan 26, 2024

lapp0 commented Jan 26, 2024 • edited

rlouf commented Jan 26, 2024

dnhkng commented Jan 26, 2024 • edited

lapp0 commented Jan 26, 2024 • edited

dnhkng commented Jan 27, 2024 • edited

rlouf commented Jan 27, 2024

dnhkng commented Jan 27, 2024

rlouf commented Jan 27, 2024

LouisHernandez17 commented May 13, 2024

dnhkng commented May 13, 2024

aaronsnoswell commented May 15, 2024 • edited

LouisHernandez17 commented May 16, 2024

rlouf commented Dec 27, 2023 •

edited

lapp0 commented Jan 26, 2024 •

edited

lapp0 commented Jan 26, 2024 •

edited

dnhkng commented Jan 26, 2024 •

edited

lapp0 commented Jan 26, 2024 •

edited

dnhkng commented Jan 27, 2024 •

edited

aaronsnoswell commented May 15, 2024 •

edited