Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For structuring JSON, I find the numbers (float/integers) are problematic with a consistent pattern of repetitive zeros #847

Open
timothylimyl opened this issue Apr 30, 2024 · 4 comments
Labels
correctness Everything related to the generation correctness JSON structured generation Linked to structured generation

Comments

@timothylimyl
Copy link

Describe the issue as clearly as possible:

When I try to structure a JSON output to float, I will occasionally hit an error which upon inspection is caused by the field outputting a non-ending zeros such as:

{ "rating" : 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Steps/code to reproduce the bug:

It is occasional occurrence.

Expected result:

Expected results will be 0.0

Error message:

You will not be able to load this as JSON.


Error validating JSON: Invalid JSON


### Outlines/Python version information:

Python: 3.10.0
Outlines: 0.0.40

### Context for the issue:

_No response_
@timothylimyl
Copy link
Author

Another issue of repetition:

'{  \n   "candidate_name" : "John Doe",\n   "employment_status" : "employed",\n   "certifications" : [\n      "Certified Project Management Professional",\n      "Adobe Certified Expert in Photoshop"\n   ],\n   "university_qualification" : [\n      {\n         "university_name" : "University of Make-Believe",\n         "level" : "Bachelors" ,\n         "field_of_study" : "Science in Engineering",\n         "graduation_year" :   \n                        \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n      \n                    \n     '

@rlouf rlouf added correctness Everything related to the generation correctness and removed bug labels May 5, 2024
@brandonwillard brandonwillard added structured generation Linked to structured generation JSON labels May 9, 2024
@smagnan
Copy link

smagnan commented May 29, 2024

Encountering the same issue frequently with integers in json schemas, even with the sample JSON schema/prompt.

Schema and prompt are the same as in the readme.
Example of problematic output:

{"name":"Babette","age":42,"armor":"leather","weapon":"bow","strength":280000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000400000000000000000000}

Note: not only 0s, there is a random 4 somewhere in there too.

I am using a llama2 based quantized 7b model, and this will happen maybe 70% of the time

@smagnan
Copy link

smagnan commented May 29, 2024

Also encountering the other repetition (\n and whitespaces repeat) issue with certain schemas / prompts.

Ex:
{\n\n\n\n\n... until it reaches max_tokens

@smagnan
Copy link

smagnan commented May 29, 2024

For the newline issue, changing the whitespace_pattern seems to help/solve that. See: #715

For the int/float issue, no good solution yet. Maybe using repetition_penalty? but not too sure where to plug that in yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correctness Everything related to the generation correctness JSON structured generation Linked to structured generation
Projects
None yet
Development

No branches or pull requests

4 participants