[Bug] llava, cuda out of memory #1593

AmazDeng · 2024-05-15T06:02:56Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

I hava one A100 gpu card.
following the instruction (https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md）, run HelloWorld llava program，error occurs. The llava model is llava-v1.5-7b, and is not very big. So, why "cuda out of memory" error occurs?

error info：

Exception in thread Thread-139 (_create_weight_func):
Traceback (most recent call last):
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 199, in _create_weight_func
    model_comm.create_shared_weights(device_id, rank)
RuntimeError: [TM][ERROR] CUDA runtime error: out of memory /lmdeploy/src/turbomind/utils/memory_utils.cu:32 

Exception in thread Thread-140 (_get_params):
Traceback (most recent call last):
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/media/star/8T/20231130/eas_demo/llava-vllm/ENV/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 215, in _get_params
    out = model_comm.get_params(device_id, rank)
RuntimeError: [TM][ERROR]  Assertion fail: /lmdeploy/src/turbomind/triton_backend/llama/LlamaTritonModel.cc:417

Reproduction

from lmdeploy import pipeline
from lmdeploy.vl import load_image
from lmdeploy import pipeline, ChatTemplateConfig
model_path='/media/star/8T/model/gpt/llava/llava-v1.5-7b'
image_path='/media/star/8T/tmp/gpt4v/1/1.png'
question="desc the image in detail "
pipe = pipeline(model_path)
image = load_image('/media/star/8T/tmp/gpt4v/1/1.png')

pipe = pipeline(model_path,
                chat_template_config=ChatTemplateConfig(model_name='vicuna'))

image = load_image(image)
response = pipe((question, image))
print(response)

Environment

can't find lmdeploy check_env file

Error traceback

No response

The text was updated successfully, but these errors were encountered:

irexyc · 2024-05-15T07:29:53Z

The loading process of vlm is:
load vision model -> load llm weight -> allocate kv cache.

For llava-v1.5-7b, the first two steps will takes up about 14.5G cuda memory. But according to your log, the out of memory occured in step 2. Is there any other programs taking up gpu memory?

AmazDeng · 2024-05-16T05:20:30Z

The loading process of vlm is: load vision model -> load llm weight -> allocate kv cache.

For llava-v1.5-7b, the first two steps will takes up about 14.5G cuda memory. But according to your log, the out of memory occured in step 2. Is there any other programs taking up gpu memory?

no,there is only one lmdeploy program running on the gpu.

irexyc · 2024-05-16T08:23:11Z

Can you try run the code without jupyter or ipython.

AmazDeng · 2024-05-17T12:51:28Z

Can you try run the code without jupyter or ipython.

I only have one card and it's currently running a program, so I can't test it right now. I'll test it next week and will share the results then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] llava, cuda out of memory #1593

[Bug] llava, cuda out of memory #1593

AmazDeng commented May 15, 2024

irexyc commented May 15, 2024

AmazDeng commented May 16, 2024

irexyc commented May 16, 2024

AmazDeng commented May 17, 2024

[Bug] llava, cuda out of memory #1593

[Bug] llava, cuda out of memory #1593

Comments

AmazDeng commented May 15, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

irexyc commented May 15, 2024

AmazDeng commented May 16, 2024

irexyc commented May 16, 2024

AmazDeng commented May 17, 2024