bce-reranker-base 模型调用报错 CUDA-capable device(s) is/are busy or unavailable #1505

tevooli · 2024-05-16T06:06:35Z

服务器4090,2卡，xinference 是昨天启动的，版本：v0.11.0，使用docker 部署，并加载了模型：

0 号上加载了 qwen1.5-chat 7b
1 号上加载了 bce-embedding-base_v1, bce-reranker-base_v1

今天使用 dify 时， qwen 跟 bce-embedding 调用正常，但是调用 rerank 的时候报错：

[xinference] Error: Failed to rerank documents, detail: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

我 bce-reranker-base_v1 模型重新加载了一下，又好了。

这个会是什么原因导致？

具体报错信息：

xinference-1  | 2024-05-16 03:55:49,229 xinference.core.model 418 DEBUG    Request rerank, current serve request count: 0, request limit: None for the model bce-reranker-base_v1-1-0
xinference-1  | 2024-05-16 03:55:49,289 xinference.core.model 418 DEBUG    After request rerank, current serve request count: 0 for the model bce-reranker-base_v1-1-0
xinference-1  | 2024-05-16 03:55:49,298 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable
xinference-1  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
xinference-1  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
xinference-1  | Traceback (most recent call last):
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1036, in rerank
xinference-1  |     scores = await model.rerank(
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
xinference-1  |     return self._process_result_message(result)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference-1  |     raise message.as_instanceof_cause()
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
xinference-1  |     result = await self._run_coro(message.message_id, coro)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
xinference-1  |     return await coro
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
xinference-1  |     return await super().__on_receive__(message)  # type: ignore
xinference-1  |   File "xoscar/core.pyx", line 558, in __on_receive__
xinference-1  |     raise ex
xinference-1  |   File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
xinference-1  |     async with self._lock:
xinference-1  |   File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
xinference-1  |     with debug_async_timeout('actor_lock_timeout',
xinference-1  |   File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
xinference-1  |     result = await result
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
xinference-1  |     ret = await func(*args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 80, in wrapped_func
xinference-1  |     ret = await fn(self, *args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 423, in rerank
xinference-1  |     return await self._call_wrapper(
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _async_wrapper
xinference-1  |     return await fn(*args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 333, in _call_wrapper
xinference-1  |     ret = await asyncio.to_thread(fn, *args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference-1  |     return await loop.run_in_executor(None, func_call)
xinference-1  |   File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
xinference-1  |     result = self.fn(*self.args, **self.kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 180, in rerank
xinference-1  |     similarity_scores = self._model.predict(sentence_combinations)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 336, in predict
xinference-1  |     self.model.to(self._target_device)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to
xinference-1  |     return super().to(*args, **kwargs)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
xinference-1  |     return self._apply(convert)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1  |     module._apply(fn)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1  |     module._apply(fn)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1  |     module._apply(fn)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
xinference-1  |     param_applied = fn(param)
xinference-1  |   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
xinference-1  |     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
xinference-1  | RuntimeError: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable
xinference-1  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
xinference-1  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

The text was updated successfully, but these errors were encountered:

XprobeBot added the gpu label May 16, 2024

XprobeBot modified the milestones: v0.11.1, v0.11.2 May 16, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bce-reranker-base 模型调用报错 CUDA-capable device(s) is/are busy or unavailable #1505

bce-reranker-base 模型调用报错 CUDA-capable device(s) is/are busy or unavailable #1505

tevooli commented May 16, 2024

bce-reranker-base 模型调用报错 CUDA-capable device(s) is/are busy or unavailable #1505

bce-reranker-base 模型调用报错 CUDA-capable device(s) is/are busy or unavailable #1505

Comments

tevooli commented May 16, 2024