Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction #855

Open
br3no opened this issue May 2, 2024 · 3 comments
Labels
bug correctness Everything related to the generation correctness

Comments

@br3no
Copy link
Contributor

br3no commented May 2, 2024

Describe the issue as clearly as possible:

See:

allowed_tokens = self.fsm.get_next_instruction(

Here the tokens field of the next instruction is treated equally regardless of whether it is of type Generate or Write.

If a Write instruction has a tokens field with length > 1, this means we will accept any of the next ff-tokens as the token in the next step. This is incorrect.

Steps/code to reproduce the bug:

The bug will only appear once the ff-tokens have length > 1.

Expected result:

N.a.

Error message:

No response

Outlines/Python version information:

Version information

python -c "import sys; print('Python', sys.version)"
pip freeze
0.0.39
Python 3.8.18 (default, Oct  2 2023, 15:02:11) 
[GCC 9.4.0]
accelerate==0.27.2
ai2-olmo==0.2.5
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.2.0
async-timeout==4.0.3
attrs==23.2.0
awscli==1.32.83
boto3==1.34.83
botocore==1.34.83
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
cmake==3.29.2
codespell==2.2.6
colorama==0.4.4
cupy-cuda12x==12.1.0
diskcache==5.6.3
distro==1.9.0
docutils==0.16
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.2
fastrlock==0.8.2
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.2.0
google-api-core==2.18.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
h11==0.14.0
hiredis==2.3.2
httpcore==1.0.4
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
importlib-resources==6.1.1
importlib_metadata==7.0.2
iniconfig==2.0.0
interegular==0.3.3
isort==5.13.2
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
lark==1.1.9
libnacl==2.1.0
llvmlite==0.41.1
lm-format-enforcer==0.9.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
mypy==1.9.0
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.1
ninja==1.11.1.1
numba==0.58.1
numpy==1.24.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
omegaconf==2.3.0
openai==1.13.3
orjson==3.9.13
outlines==0.0.39
packaging==23.2
peft==0.9.0
pillow==10.3.0
pkgutil_resolve_name==1.3.10
pluggy==1.4.0
prometheus_client==0.20.0
proto-plus==1.23.0
protobuf==4.25.2
psutil==5.9.8
py==1.11.0
py-cpuinfo==9.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
pynvml==11.5.0
pytest==8.0.2
pytest-asyncio==0.23.5
pytest-forked==1.6.0
pytest-rerunfailures==13.0
pytest-shard==0.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
PyYAML==6.0.1
quantile-python==1.1
ray==2.9.2
redis==5.0.3
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.1
rpds-py==0.17.1
rsa==4.7.2
ruff==0.1.5
s3transfer==0.10.1
safetensors==0.4.2
scipy==1.10.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tensorizer==2.9.0a0
tiktoken==0.6.0
tokenizers==0.19.1
toml==0.10.2
tomli==2.0.1
torch==2.1.2
tqdm==4.66.1
transformers==4.40.0
triton==2.1.0
types-PyYAML==6.0.12.12
types-requests==2.31.0.6
types-setuptools==69.1.0.20240308
types-urllib3==1.26.25.14
typing_extensions==4.9.0
urllib3==1.26.18
uvicorn==0.27.0.post1
uvloop==0.19.0
-e git+ssh://git@github.com/br3no/vllm.git@e6ffb1af2e904436473568f9d43131c998e55063#egg=vllm
watchfiles==0.21.0
websockets==12.0
xformers==0.0.23.post1
yapf==0.32.0
yarl==1.9.4
zipp==3.17.0

Context for the issue:

Bug was discussed in a call with @rlouf.

@br3no br3no added the bug label May 2, 2024
@br3no br3no changed the title Logits Processors Guide integration will fail be buggy when len(tokens) > 1 in a Write instruction Logits Processors Guide integration will be buggy when len(tokens) > 1 in a Write instruction May 2, 2024
@brandonwillard brandonwillard added the correctness Everything related to the generation correctness label May 9, 2024
@ekagra-ranjan
Copy link

Hi @br3no , I have a couple of questions on this Issue. Can you pls share more detail on these?

  1. what is ff-tokens?
  2. I had this doubt before too because in the codebase it seems Write and Generate are used interchangeably. What is the difference bw Write and Generate?

@br3no
Copy link
Contributor Author

br3no commented May 14, 2024

ff-tokens are fast-forward tokens. When you are generating guided output, e.g. a json object, there are moments when you don't really need an LLM to generate the next tokens, because the next tokens are specified by the guide. This reduces the load on the GPU and is generally much faster, as you only need to traverse the state-machine.

Write and Generate are instructions. A Generate instruction signals that the next step in the sequence requires an LLM generation. The tokens member variable contains the valid next tokens in the sequence, according to the guide (the state machine). A Write instruction signals that the next step(s) in the sequence does not require an LLM generation. The tokens member variable then contains the next tokens in the sequence.

@ekagra-ranjan
Copy link

Thank you @br3no ! Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug correctness Everything related to the generation correctness
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

3 participants