Logits Processors `Guide` integration will be buggy when `len(tokens) > 1` in a `Write` instruction #855

br3no · 2024-05-02T09:10:04Z

Describe the issue as clearly as possible:

See:

Line 110 in d6a2b79

allowed_tokens = self.fsm.get_next_instruction(

Here the tokens field of the next instruction is treated equally regardless of whether it is of type Generate or Write.

If a Write instruction has a tokens field with length > 1, this means we will accept any of the next ff-tokens as the token in the next step. This is incorrect.

Steps/code to reproduce the bug:

The bug will only appear once the ff-tokens have length > 1.

Expected result:

N.a.

Error message:

No response

Outlines/Python version information:

Version information

python -c "import sys; print('Python', sys.version)"
pip freeze
0.0.39
Python 3.8.18 (default, Oct  2 2023, 15:02:11) 
[GCC 9.4.0]
accelerate==0.27.2
ai2-olmo==0.2.5
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.2.0
async-timeout==4.0.3
attrs==23.2.0
awscli==1.32.83
boto3==1.34.83
botocore==1.34.83
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.2
codespell==2.2.6
colorama==0.4.4
cupy-cuda12x==12.1.0
diskcache==5.6.3
distro==1.9.0
docutils==0.16
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.2
fastrlock==0.8.2
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.2.0
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
h11==0.14.0
hiredis==2.3.2
httpcore==1.0.4
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
importlib-resources==6.1.1
importlib_metadata==7.0.2
iniconfig==2.0.0
interegular==0.3.3
isort==5.13.2
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
lark==1.1.9
libnacl==2.1.0
llvmlite==0.41.1
lm-format-enforcer==0.9.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
mypy==1.9.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.1
ninja==1.11.1.1
numba==0.58.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
omegaconf==2.3.0
openai==1.13.3
orjson==3.9.13
outlines==0.0.39
packaging==23.2
peft==0.9.0
pillow==10.3.0
pkgutil_resolve_name==1.3.10
pluggy==1.4.0
prometheus_client==0.20.0
proto-plus==1.23.0
protobuf==4.25.2
psutil==5.9.8
py==1.11.0
py-cpuinfo==9.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
pynvml==11.5.0
pytest==8.0.2
pytest-asyncio==0.23.5
pytest-forked==1.6.0
pytest-rerunfailures==13.0
pytest-shard==0.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
PyYAML==6.0.1
quantile-python==1.1
ray==2.9.2
redis==5.0.3
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.1
rpds-py==0.17.1
rsa==4.7.2
ruff==0.1.5
s3transfer==0.10.1
safetensors==0.4.2
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tensorizer==2.9.0a0
tiktoken==0.6.0
tokenizers==0.19.1
toml==0.10.2
tomli==2.0.1
torch==2.1.2
tqdm==4.66.1
transformers==4.40.0
triton==2.1.0
types-PyYAML==6.0.12.12
types-requests==2.31.0.6
types-setuptools==69.1.0.20240308
types-urllib3==1.26.25.14
typing_extensions==4.9.0
urllib3==1.26.18
uvicorn==0.27.0.post1
uvloop==0.19.0
-e git+ssh://git@github.com/br3no/vllm.git@e6ffb1af2e904436473568f9d43131c998e55063#egg=vllm
watchfiles==0.21.0
websockets==12.0
xformers==0.0.23.post1
yapf==0.32.0
yarl==1.9.4
zipp==3.17.0

Context for the issue:

Bug was discussed in a call with @rlouf.

The text was updated successfully, but these errors were encountered:

ekagra-ranjan · 2024-05-14T02:27:50Z

Hi @br3no , I have a couple of questions on this Issue. Can you pls share more detail on these?

what is ff-tokens?
I had this doubt before too because in the codebase it seems Write and Generate are used interchangeably. What is the difference bw Write and Generate?

br3no · 2024-05-14T07:43:30Z

ff-tokens are fast-forward tokens. When you are generating guided output, e.g. a json object, there are moments when you don't really need an LLM to generate the next tokens, because the next tokens are specified by the guide. This reduces the load on the GPU and is generally much faster, as you only need to traverse the state-machine.

Write and Generate are instructions. A Generate instruction signals that the next step in the sequence requires an LLM generation. The tokens member variable contains the valid next tokens in the sequence, according to the guide (the state machine). A Write instruction signals that the next step(s) in the sequence does not require an LLM generation. The tokens member variable then contains the next tokens in the sequence.

ekagra-ranjan · 2024-05-14T15:08:15Z

Thank you @br3no ! Much appreciated!

br3no added the bug label May 2, 2024

br3no changed the title ~~Logits Processors Guide integration will fail be buggy when len(tokens) > 1 in a Write instruction~~ Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction May 2, 2024

brandonwillard added the correctness Everything related to the generation correctness label May 9, 2024

lapp0 added a commit to lapp0/outlines that referenced this issue May 17, 2024

reproduce outlines-dev#855

b500242

This was referenced May 17, 2024

Allow Non-Incremental vllm.RegexLogitsProcessor (enable fast-forward) #900

Closed

Test the CFG integration for llama.cpp #796

Closed

lapp0 mentioned this issue May 29, 2024

Use LogitsProcessor for transformers integration #926

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logits Processors `Guide` integration will be buggy when `len(tokens) > 1` in a `Write` instruction #855

Logits Processors `Guide` integration will be buggy when `len(tokens) > 1` in a `Write` instruction #855

br3no commented May 2, 2024

ekagra-ranjan commented May 14, 2024

br3no commented May 14, 2024

ekagra-ranjan commented May 14, 2024

Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction #855

Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction #855

Comments

br3no commented May 2, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

ekagra-ranjan commented May 14, 2024

br3no commented May 14, 2024

ekagra-ranjan commented May 14, 2024

Logits Processors `Guide` integration will be buggy when `len(tokens) > 1` in a `Write` instruction #855

Logits Processors `Guide` integration will be buggy when `len(tokens) > 1` in a `Write` instruction #855