Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bce-reranker-base 模型调用报错 CUDA-capable device(s) is/are busy or unavailable #1505

Open
tevooli opened this issue May 16, 2024 · 0 comments
Labels
Milestone

Comments

@tevooli
Copy link

tevooli commented May 16, 2024

服务器4090,2卡,xinference 是昨天启动的,版本:v0.11.0,使用docker 部署,并加载了模型:

  • 0 号上加载了 qwen1.5-chat 7b
  • 1 号上加载了 bce-embedding-base_v1, bce-reranker-base_v1

今天使用 dify 时, qwen 跟 bce-embedding 调用正常, 但是调用 rerank 的时候报错:

[xinference] Error: Failed to rerank documents, detail: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

我 bce-reranker-base_v1 模型重新加载了一下,又好了。

这个会是什么原因导致?

具体报错信息:

xinference-1  | 2024-05-16 03:55:49,229 xinference.core.model 418 DEBUG    Request rerank, current serve request count: 0, request limit: None for the model bce-reranker-base_v1-1-0
xinference-1  | 2024-05-16 03:55:49,289 xinference.core.model 418 DEBUG    After request rerank, current serve request count: 0 for the model bce-reranker-base_v1-1-0
xinference-1  | 2024-05-16 03:55:49,298 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable
xinference-1  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
xinference-1  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
xinference-1  | Traceback (most recent call last):
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1036, in rerank
xinference-1  |     scores = await model.rerank(
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
xinference-1  |     return self._process_result_message(result)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference-1  |     raise message.as_instanceof_cause()
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
xinference-1  |     result = await self._run_coro(message.message_id, coro)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
xinference-1  |     return await coro
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
xinference-1  |     return await super().__on_receive__(message)  # type: ignore
xinference-1  |   File "xoscar/core.pyx", line 558, in __on_receive__
xinference-1  |     raise ex
xinference-1  |   File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
xinference-1  |     async with self._lock:
xinference-1  |   File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
xinference-1  |     with debug_async_timeout('actor_lock_timeout',
xinference-1  |   File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
xinference-1  |     result = await result
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
xinference-1  |     ret = await func(*args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 80, in wrapped_func
xinference-1  |     ret = await fn(self, *args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 423, in rerank
xinference-1  |     return await self._call_wrapper(
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _async_wrapper
xinference-1  |     return await fn(*args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 333, in _call_wrapper
xinference-1  |     ret = await asyncio.to_thread(fn, *args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference-1  |     return await loop.run_in_executor(None, func_call)
xinference-1  |   File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
xinference-1  |     result = self.fn(*self.args, **self.kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 180, in rerank
xinference-1  |     similarity_scores = self._model.predict(sentence_combinations)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 336, in predict
xinference-1  |     self.model.to(self._target_device)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to
xinference-1  |     return super().to(*args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
xinference-1  |     return self._apply(convert)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1  |     module._apply(fn)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1  |     module._apply(fn)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1  |     module._apply(fn)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
xinference-1  |     param_applied = fn(param)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
xinference-1  |     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
xinference-1  | RuntimeError: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable
xinference-1  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
xinference-1  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

@XprobeBot XprobeBot added the gpu label May 16, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.1, v0.11.2 May 16, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants