scripts/oai_api_demo/openai_api_server.py 使用多张gpu失败 #22

xiaoToby · 2024-05-07T02:58:38Z

提交前必须检查以下项目

请确保使用的是仓库最新代码（git pull）
已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案。
第三方插件问题：例如llama.cpp、text-generation-webui等，建议优先去对应的项目中查找解决方案。

问题类型

模型推理

基础模型

Llama-3-Chinese-8B-Instruct（指令模型）

操作系统

Linux

详细描述问题

# 请在此处粘贴运行代码（请粘贴在本代码块里）

依赖情况（代码类问题务必提供）

# 请在此处粘贴依赖情况（请粘贴在本代码块里）

运行日志或截图

# 请在此处粘贴运行日志（请粘贴在本代码块里）

python openai_api_server.py --base_model models/llama-3-chinese-8b/ --gpus 1,2

指定使用多张gpu
测试：
curl http://0.0.0.0:19327/v1/completions
-H "Content-Type: application/json"
-d '{
"prompt": "请你介绍一下中国的首都"
}'

一直没有json返回体

但如果是使用单卡，就不使用--gpus，则返回正常，有json返回体

The text was updated successfully, but these errors were encountered:

xiaoToby · 2024-05-07T03:32:28Z

@ymcui

xiaoToby · 2024-05-07T03:34:24Z

xiaoToby · 2024-05-07T03:48:02Z

单卡推理速度较慢，我的gpu配置：A800(80G)

xiaoToby · 2024-05-07T03:50:55Z

调用/v1/embeddings时，报以下错：
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2763, in _get_padding_truncation_strategies
raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

ymcui · 2024-05-07T04:53:36Z

多卡问题不太清楚，自行排查吧。
刚刚试了以下，completion/chat/embedding功能都正常。另外，你加载的是instruct模型吗？看路径名并不是。

completion

请求：

curl http://localhost:19327/v1/completions -H "Content-Type: application/json" \
  -d '{"prompt": "请你介绍一下中国的首都"}'

返回：
{"id":"cmpl-C4ULnmFXM3hrB4oVarG2fM","object":"text_completion","created":1715057211,"model":"llama-3-chinese","choices":[{"index":0,"text":"北京是中华人民共和国的首都，位于华北平原中部。它是一个历史悠久、文化底蕴深厚的大都市，也是政治、经济和文化中心之一。此外，它还是世界上最大的城市之一，有着丰富多彩的人文景观，如故宫、天安门广场等。"}]}

chat

请求：

curl http://localhost:19327/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user","content": "如何制作个人网站？"}], "repetition_penalty": 1.0}'

返回：
{"id":"chatcmpl-URtHUfRFhcmGsNvrGKPUUU","object":"chat.completion","created":1715057270,"model":"llama-3-chinese","choices":[{"index":0,"message":{"role":"user","content":"如何制作个人网站？"}},{"index":1,"message":{"role":"assistant","content":"制作个人网站需要以下步骤：\n\n1. 确定网站目的和内容：首先要明确自己制作个人网站的目的和内容，例如展示个人作品、介绍个人经历、提供个人服务等。\n\n2. 选择网站平台：选择一个适合自己需求的网站平台，例如WordPress、Wix、Squarespace等。这些平台提供了丰富的模板和功能，可以帮助你快速搭建个人网站。\n\n3. 注册域名和主机：注册一个独特的域名，并选择一个可靠的主机提供商来托管你的网站。域名和主机的选择要根据自己的预算和需求来决定。\n\n4. 选择模板和设计：根据自己的需求和喜好，选择一个适合的模板和设计风格。可以在网站平台上选择预设的模板，也可以自己设计。\n\n5. 添加内容和功能：根据自己的需求和内容，添加相应的页面和功能。例如，添加个人简介、作品展示、联系方式等。\n\n6. 优化网站：优化网站的SEO（搜索引擎优化），使其在搜索引擎中更容易被发现。可以通过关键词优化、网站结构优化、内容优化等方式来提高网站的排名。\n\n7. 发布和推广：将网站发布到互联网上，并通过社交媒体、电子邮件等方式进行推广。可以通过SEO、社交媒体广告等方式来提高网站的曝光率。\n\n以上是制作个人网站的基本步骤，具体操作可以根据不同的网站平台和需求进行调整。"}}]}

embedding

请求：

!curl http://localhost:19327/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "今天天气真不错"}'

返回：
{"object":"list","data":[{"object":"embedding","embedding":[0.00848311372101307,-0.013980869203805923,...,0.01847970299422741],"index":0}],"model":"llama-3-chinese"}

xiaoToby · 2024-05-07T08:37:47Z

2024-05-07 08:36:06,090 - INFO - 172.23.0.1:51046 - "POST /v1/chat/completions HTTP/1.1" 200
Exception in thread Thread-4 (generate):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1622, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2834, in _sample
raise ValueError("If eos_token_id is defined, make sure that pad_token_id is defined.")
ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

@ymcui 又出现了这样的问题

iMountTai · 2024-05-07T09:28:43Z

模型加载错了，使用llama-3-chinese-8b-instruct，不是llama-3-chinese-8b。 @xiaoToby

xiaoToby · 2024-05-07T09:53:59Z

模型加载错了，使用llama-3-chinese-8b-instruct，不是llama-3-chinese-8b。 @xiaoToby

是的，已修正。
现又出现了 ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

ymcui · 2024-05-07T10:01:16Z

你用的transformers版本是多少？相关依赖版本贴出来。

xiaoToby · 2024-05-07T10:06:56Z

你用的transformers版本是多少？相关依赖版本贴出来。

4.40.2 @ymcui

xiaoToby · 2024-05-07T10:09:34Z

Package Version

accelerate 0.30.0
aiohttp 3.9.5
aioprometheus 23.12.0
aiosignal 1.3.1
anyio 4.3.0
async-timeout 4.0.3
attrs 23.2.0
blobfile 2.1.1
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
dnspython 2.6.1
email_validator 2.1.1
exceptiongroup 1.2.1
fairscale 0.4.13
fastapi 0.111.0
fastapi-cli 0.0.2
filelock 3.14.0
fire 0.6.0
frozenlist 1.4.1
fschat 0.2.36
fsspec 2024.3.1
h11 0.14.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.0
idna 3.7
Jinja2 3.1.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
lxml 4.9.4
markdown-it-py 3.0.0
markdown2 2.4.13
MarkupSafe 2.1.5
mdurl 0.1.2
mpmath 1.3.0
msgpack 1.0.8
multidict 6.0.5
networkx 3.3
nh3 0.2.17
ninja 1.11.1.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
orjson 3.10.3
packaging 24.0
pandas 2.2.2
peft 0.10.0
pip 22.0.2
prompt-toolkit 3.0.43
protobuf 5.26.1
psutil 5.9.8
pyarrow 16.0.0
pycryptodomex 3.20.0
pydantic 1.10.13
Pygments 2.18.0
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
quantile-python 1.1
ray 2.20.0
referencing 0.35.1
regex 2024.4.28
requests 2.31.0
rich 13.7.1
rpds-py 0.18.1
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 59.6.0
shellingham 1.5.4
shortuuid 1.0.13
six 1.16.0
sniffio 1.3.1
sse-starlette 2.1.0
starlette 0.37.2
svgwrite 1.4.3
sympy 1.12
termcolor 2.4.0
tiktoken 0.4.0
tokenizers 0.19.1
torch 2.3.0
tqdm 4.66.4
transformers 4.40.2
triton 2.3.0
typer 0.12.3
typing_extensions 4.11.0
tzdata 2024.1
ujson 5.9.0
urllib3 2.2.1
uvicorn 0.29.0
uvloop 0.19.0
vllm 0.2.5
watchfiles 0.21.0
wavedrom 2.0.3.post3
wcwidth 0.2.13
websockets 12.0
wheel 0.37.1
xformers 0.0.26.post1
yarl 1.9.4

iMountTai · 2024-05-07T11:34:38Z

你转换权重的时候不要只下载safetensors权重，tokenizer相关的文件也是要更新的。

xiaoToby · 2024-05-08T02:10:49Z

你转换权重的时候不要只下载safetensors权重，tokenizer相关的文件也是要更新的。

没有转换权重

lzduo · 2024-05-08T09:03:10Z

模型加载错了，使用llama-3-chinese-8b-instruct，不是llama-3-chinese-8b。 @xiaoToby

是的，已修正。现又出现了 ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

你可以试试把openai_api_server.py中predict函数和stream_predict函数中，所有的
generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, num_beams=num_beams, do_sample=do_sample, **kwargs, )
代码段中插入
eos_token_id=llama3_eos_ids, pad_token_id=tokenizer.eos_token_id,
变成
generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, eos_token_id=llama3_eos_ids, pad_token_id=tokenizer.eos_token_id, num_beams=num_beams, do_sample=do_sample, **kwargs, )
前面还有加入llama3_eos_ids = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")]
我这么修改后不报这个错误了

kekehia123 · 2024-05-15T05:50:09Z

模型加载错了，使用llama-3-chinese-8b-instruct，不是llama-3-chinese-8b。 @xiaoToby

是的，已修正。现又出现了 ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

你可以试试把openai_api_server.py中predict函数和stream_predict函数中，所有的 generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, num_beams=num_beams, do_sample=do_sample, **kwargs, ) 代码段中插入 eos_token_id=llama3_eos_ids, pad_token_id=tokenizer.eos_token_id, 变成 generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, eos_token_id=llama3_eos_ids, pad_token_id=tokenizer.eos_token_id, num_beams=num_beams, do_sample=do_sample, **kwargs, ) 前面还有加入llama3_eos_ids = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")] 我这么修改后不报这个错误了

我加了这些之后，get_embedding还是会报错。是不是这里对get_embedding没有作用呢？

以下是报错信息：

2024-05-15 11:54:31,780 - ERROR - Exception in ASGI application
Traceback (most recent call last):
File "/home/xinke/.local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/xinke/.local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
return await self.app(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
File "/home/xinke/.local/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "/home/xinke/.local/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "scripts/oai_api_demo/openai_api_server.py", line 380, in create_embeddings
embedding = get_embedding(request.input)
File "scripts/oai_api_demo/openai_api_server.py", line 294, in get_embedding
encoding = tokenizer(input, padding=True, return_tensors="pt")
File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2858, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2964, in _call_one
return self.encode_plus(
File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3028, in encode_plus
padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2763, in _get_padding_truncation_strategies
raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

kekehia123 · 2024-05-15T06:46:24Z

模型加载错了，使用llama-3-chinese-8b-instruct，不是llama-3-chinese-8b。 @xiaoToby

是的，已修正。现又出现了 ValueError: If eos_token_id is defined, make sure that pad_token_id is defined.

你可以试试把openai_api_server.py中predict函数和stream_predict函数中，所有的 generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, num_beams=num_beams, do_sample=do_sample, **kwargs, ) 代码段中插入 eos_token_id=llama3_eos_ids, pad_token_id=tokenizer.eos_token_id, 变成 generation_config = GenerationConfig( temperature=temperature, top_p=top_p, top_k=top_k, eos_token_id=llama3_eos_ids, pad_token_id=tokenizer.eos_token_id, num_beams=num_beams, do_sample=do_sample, **kwargs, ) 前面还有加入llama3_eos_ids = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")] 我这么修改后不报这个错误了

我加了这些之后，get_embedding还是会报错。是不是这里对get_embedding没有作用呢？

以下是报错信息：

2024-05-15 11:54:31,780 - ERROR - Exception in ASGI application Traceback (most recent call last): File "/home/xinke/.local/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/xinke/.local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call return await self.app(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/xinke/.local/lib/python3.8/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/xinke/.local/lib/python3.8/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/home/xinke/.local/lib/python3.8/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/home/xinke/.local/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "scripts/oai_api_demo/openai_api_server.py", line 380, in create_embeddings embedding = get_embedding(request.input) File "scripts/oai_api_demo/openai_api_server.py", line 294, in get_embedding encoding = tokenizer(input, padding=True, return_tensors="pt") File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2858, in call encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs) File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2964, in _call_one return self.encode_plus( File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 3028, in encode_plus padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies( File "/home/xinke/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2763, in _get_padding_truncation_strategies raise ValueError( ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

该问题好像解决了，就是在openai_api_server.py的get_embedding函数中encoding = tokenizer(input,padding=True, return tensors="pt")前面增加一行：
tokenizer.pad token = tokenizer.eos token

github-actions · 2024-05-29T22:05:39Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions · 2024-06-05T22:05:45Z

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

github-actions bot added the stale label May 29, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts/oai_api_demo/openai_api_server.py 使用多张gpu失败 #22

scripts/oai_api_demo/openai_api_server.py 使用多张gpu失败 #22

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

ymcui commented May 7, 2024

xiaoToby commented May 7, 2024

iMountTai commented May 7, 2024

xiaoToby commented May 7, 2024

ymcui commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

iMountTai commented May 7, 2024

xiaoToby commented May 8, 2024

lzduo commented May 8, 2024

kekehia123 commented May 15, 2024

kekehia123 commented May 15, 2024

github-actions bot commented May 29, 2024

github-actions bot commented Jun 5, 2024

scripts/oai_api_demo/openai_api_server.py 使用多张gpu失败 #22

scripts/oai_api_demo/openai_api_server.py 使用多张gpu失败 #22

Comments

xiaoToby commented May 7, 2024

提交前必须检查以下项目

问题类型

基础模型

操作系统

详细描述问题

依赖情况（代码类问题务必提供）

运行日志或截图

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

ymcui commented May 7, 2024

completion

chat

embedding

xiaoToby commented May 7, 2024

iMountTai commented May 7, 2024

xiaoToby commented May 7, 2024

ymcui commented May 7, 2024

xiaoToby commented May 7, 2024

xiaoToby commented May 7, 2024

iMountTai commented May 7, 2024

xiaoToby commented May 8, 2024

lzduo commented May 8, 2024

kekehia123 commented May 15, 2024

kekehia123 commented May 15, 2024

github-actions bot commented May 29, 2024

github-actions bot commented Jun 5, 2024