You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux makinota08 6.5.0-28-generic 29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
16G RAM, Nvidia 1660Ti 6G, Acer Nitro Laptop
Describe the bug
I have localAI deployed in a docker container. I load a sentence transformer embeddings model, and test it succesfully using curl. Then I wait for 5 minutes, and check in the log that the watchdog tries to kill the process. The log shows that the process is succesfully killed, it no longer detects the idle connection.
However, if I issue a nvidia-smi, I can see the python process stuck and loaded into the memory of the GPU.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 3W / 80W | 2268MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 5574 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 25660 C python 2260MiB |
+---------------------------------------------------------------------------------------+
also, with ps -fe you can see the process still there:
It seems like there might be an issue with the watchdog not effectively killing the sentence transformers backend process. To investigate further, we will need to check if any other part of the code or configuration is keeping the process alive or if there are any other threads associated with the process that are preventing the GPU memory from being cleared. We'll also review the logs and related code to pinpoint the root cause of the problem.
Environment, Container Image, Hardware
LocalAI version: 2.14.0
Container Image: localai/localai:v2.14.0-cublas-cuda12-ffmpeg
nvidia CUDA 12, intel x86_64, Ubuntu 22.04 LTS.
Linux makinota08 6.5.0-28-generic 29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
16G RAM, Nvidia 1660Ti 6G, Acer Nitro Laptop
Describe the bug
I have localAI deployed in a docker container. I load a sentence transformer embeddings model, and test it succesfully using curl. Then I wait for 5 minutes, and check in the log that the watchdog tries to kill the process. The log shows that the process is succesfully killed, it no longer detects the idle connection.
However, if I issue a nvidia-smi, I can see the python process stuck and loaded into the memory of the GPU.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06 Driver Version: 545.29.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 3W / 80W | 2268MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 5574 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 25660 C python 2260MiB |
+---------------------------------------------------------------------------------------+
also, with ps -fe you can see the process still there:
root 25660 24649 0 11:33 pts/0 00:00:09 python /build/backend/python/sentencetransformers/sentencetransformers.py --addr 127.0.0.1:39807
To Reproduce
Activate Watchdog using ENV variables.
WATCHDOG_BUSY | true
WATCHDOG_BUSY_TIMEOUT | 30m
WATCHDOG_IDLE | true
WATCHDOG_IDLE_TIMEOUT | 5m
Test sentencetransformers model with the following yaml config:
name: e5-large
backend: sentencetransformers
embeddings: true
parameters:
model: intfloat/multilingual-e5-large
context_size: 1024
Wait for 5 minutes.
Expected behavior
The sentencetransformers backend should have been killed by the watchdog after 5 minutes, so the GPU memory is cleared.
Logs
test of embeddings model.txt
The text was updated successfully, but these errors were encountered: