ERROR: stderr update_slots : failed to find free space in the KV cache #2282

netandreus · 2024-05-10T06:20:32Z

LocalAI version:

v2.14.0

Environment, CPU architecture, OS, and Version:

Apple MacBook M2 Max
MacOS Ventura 13.6.3 (22G436)

Describe the bug

Errors, when trying to process request.

10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32

To Reproduce

URL: /v1/chat/completions
Model: orca-2-13b-q4.gguf (https://huggingface.co/TheBloke/Orca-2-13B-GGUF/resolve/main/orca-2-13b.Q4_0.gguf)
orca-2-13b-q4.yaml

#######################
# orca-2-13b-q4 #
#######################

# @see https://huggingface.co/TheBloke/Orca-2-13B-GGUF/resolve/main/orca-2-13b.Q4_0.gguf
context_size: 2000
f16: true
# gpu_layers: 1
gpu_layers: 50
name: orca-2-13b-q4
parameters:
  model: orca-2-13b-q4.gguf
  temperature: 0.9
  top_k: 40
  top_p: 0.65
embeddings: true

orca-2-13b-q4.gguf.tmpl

<|im_start|>system
Answer the question. Be Concise.<|im_end|>
<|im_start|>user
{{.Input}}<|im_end|>
<|im_start|>assistant

Request:

{
  "model": "orca-2-13b-q4",
  "language": "",
  "n": 1,
  "top_p": 1,
  "top_k": null,
  "temperature": 0,
  "max_tokens": null,
  "echo": false,
  "batch": 0,
  "ignore_eos": false,
  "repeat_penalty": 0,
  "n_keep": 0,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "tfz": null,
  "typical_p": null,
  "seed": null,
  "negative_prompt": "",
  "rope_freq_base": 0,
  "rope_freq_scale": 0,
  "negative_prompt_scale": 0,
  "use_fast_tokenizer": false,
  "clip_skip": 0,
  "tokenizer": "",
  "file": "",
  "response_format": {},
  "size": "",
  "prompt": null,
  "instruction": "",
  "input": null,
  "stop": null,
  "messages": [
    {
      "role": "system",
      "content": "You need to correct the errors in the text after the OCR. Return only corrected text."
    },
    {
      "role": "user",
      "content": "XXX XXXXXX XXXXX XXXXXXX mollell vubgl Wguw\n\nCOMMERCIAL LICENSE ﺔﻳﺭﺎﺠﺗ ﺔﺼﺧﺭ\n\nCompany : XXXXXXXXXXX\n\n: ﺔﻛﺮﺸﻟﺍ ﻢﺳﺍ\n\nﺪﺘﻴﻤﻴﻟ ﺏﻭﺮﺟ ﺵﺮﻴﺴﻳﺭ ﺲﺘﻴﻛﺭﺎﻣ ﻞﻴﻔﻴﻟ\n\n: ﻞﻴﺠﺴﺘﻟﺍ ﻢﻗﺭ\n\nRegistered Number :\n\n1111111111\n\nType of Legal Entity : Private Company Limited by Shares\n\n: xxxxxxxxx\n\nAddress : Dedicated Desk 1111, Floor 11, XX Xxxxx Xxxxx, Xxx Xxxx Xxxx Xxxx Xxxxx, Xxx Xxxxxx Xxxx, Xxx Xxxx, \n\n ,1119 \n\nAuthorised Signatory : Xxx Xxxx\n\n: \n\n\n\nBusiness Activities : Market research and public opinion polling ; Other information service activities n.e.c ; Management consultancy activities ; Advertising\n\n: ﺔ\n\nIssue Date :\n\n14 October 2010\n\n: ﺭﺍﺪﺻﻹﺍ ﺦﻳﺭﺎﺗ\n\nExpiry Date :\n\n13 October 2010\n\n: \n\nVerify Document Code COMPANIES-12121212\n\nApproved Electronic Document issued by Xxx Xxxxx Xxxxxx Xxxxxx Xxxxxx Xxxxxx. To verify, please visit www.xxxxxx.yyyyyy.com\n"
    }
  ],
  "functions": null,
  "function_call": null,
  "stream": false,
  "mode": 0,
  "step": 0,
  "grammar": "",
  "grammar_json_functions": null,
  "backend": "",
  "model_base_name": ""
}

Expected behavior

Some text response.

Logs

10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 32
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 16
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 8
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 4
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 2
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 1
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to decode the batch, n_batch = 1, ret = 1
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 256
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 128
10:05AM DBG GRPC(orca-2-13b-q4.gguf-127.0.0.1:51965): stderr update_slots : failed to find free space in the KV cache, retrying with smaller n_batch = 64

Additional context

The text was updated successfully, but these errors were encountered:

mudler · 2024-05-10T06:48:56Z

@netandreus this seems to likely happen when the prompt exhausts the context size - can you check if that's causing issues in your case by bumping the context size?

In any case sounds reasonable to bail out early instead of trying to free space in the KV cache. this seem also related to #2258 - can you also try by setting batch to 1 in the model configuration and see if keeps happening?

parameters:
  batch: 1

DavidGOrtega · 2024-05-10T10:38:46Z

related to #2258

netandreus · 2024-05-10T14:25:39Z

Thank you for assistance, I will check.

imihic · 2024-05-18T12:24:07Z

It seems that this error also happens if we enable parallel llama.cpp processing. For an example, setting the context size to 8192 and the number of parallel processes to 20, the token stream generation always stops at around 410 characters, which is roughly equal to 8192 divided by 20.

So, instead of each process allocating 8192 context window size, as specified in the .env/yaml file, the backend takes this value and splits it between all the processes.

Is this a bug or expected behaviour? If it's expected it might not be a bad idea to clarify this behaviour in the documentation.

netandreus added bug Something isn't working unconfirmed labels May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: stderr update_slots : failed to find free space in the KV cache #2282

ERROR: stderr update_slots : failed to find free space in the KV cache #2282

netandreus commented May 10, 2024

mudler commented May 10, 2024 •

edited

DavidGOrtega commented May 10, 2024

netandreus commented May 10, 2024

imihic commented May 18, 2024 •

edited

ERROR: stderr update_slots : failed to find free space in the KV cache #2282

ERROR: stderr update_slots : failed to find free space in the KV cache #2282

Comments

netandreus commented May 10, 2024

mudler commented May 10, 2024 • edited

DavidGOrtega commented May 10, 2024

netandreus commented May 10, 2024

imihic commented May 18, 2024 • edited

mudler commented May 10, 2024 •

edited

imihic commented May 18, 2024 •

edited