-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support cogvlm-chat #1502
Conversation
0e4befd
to
29592f3
Compare
Install LMDeploy with pip (Python 3.8+). Refer to [Installation](https://lmdeploy.readthedocs.io/en/latest/get_started.html#installation) for more. | ||
|
||
```shell | ||
pip install lmdeploy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xformers should be installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I installed xformers. It will install torch 2.3.0
But lmdeploy requires torch<=2.2.2,>=2.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better guide users about installing xformers before lmdeploy.
@zhulinJulia24 please add cogvlm and cogvlm2 into test cases |
Ok, trying now. First building docker image from this PR:
|
I modified the docker llava-like thing for this cogvlm2 case:
And notice on startup:
is that ok? |
A quite strange response the transformers use of the model doesn't do. Seems like the prompting is off. This is with no image: Another funny one: Another bad one: It isn't always bad, but something seems off. Never noticed such oddities with the cogvlm2 demos locally. But if I pass an image it responds ok: |
I see this in logs. Maybe something uninteded going on? It's ok as long as not doing CUDA in a fork.
|
@pseudotensor hi, are you using with tp>1? if so, you need to include your code in
|
@pseudotensor hi, in cogvlm2's demo. The prompt is wrapped with text only template for a session without image input as in here. Can you try again with 7080da2 .
|
docs/en/multi_modal/cogvlm.md
Outdated
|
||
### Prepare | ||
|
||
Download CogVLM models using huggingface-cli. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When deploying the CogVLM model using LMDeploy, it is necessary to download the model first, as the CogVLM model repository does not include the tokenizer model.
However, this step is not required for CogVLM2.
Taking one CogVLM model cogvlm-chat-hf
as an example, you can prepare it as follows:
huggingface-cli download THUDM/cogvlm-chat-hf --local-dir ./cogvlm-chat-hf --local-dir-use-symlinks False
huggingface-cli download lmsys/vicuna-7b-v1.5 special_tokens_map.json tokenizer.model tokenizer_config.json --local-dir ./cogvlm-chat-hf --local-dir-use-symlinks False
docs/en/multi_modal/cogvlm.md
Outdated
from lmdeploy import pipeline, PytorchEngineConfig | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('cogvlm-chat-hf', backend_config=PytorchEngineConfig(tp=1, max_prefill_token_num=4096, cache_max_entry_count=0.8)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to set max_prefill_token_num
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary. Will remove later.
docs/en/multi_modal/cogvlm.md
Outdated
Note xformers depends on torch and you should select a version that won't reinstall torch. The following works for `torch==2.2.0`. | ||
|
||
```shell | ||
# for torch==2.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In openmmlab/lmdeploy
docker images, the version of torch is 2.1.0
Should we update it in the Dockerfile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for torch2.1.0, users could install xformers<0.0.23
. As suggested in the docs, users should select a version that won't reinstall torch.
No need to update Dockerfile to torch2.2.0 for torch2.1.0 with triton2.1.0 is desired.
0afe31f
to
c516afd
Compare
Hi, I'm using the docker image. I'm just reporting what is said in docker logs. |
@pseudotensor hi, this warning is from huggingface tokenizer. You can safely ignore it. If you want to avoid the warnings, explicitly set the env |
@@ -122,6 +122,7 @@ For detailed inference benchmarks in more devices and more settings, please refe | |||
<li>Mixtral (8x7B, 8x22B)</li> | |||
<li>Gemma (2B - 7B)</li> | |||
<li>Dbrx (132B)</li> | |||
<li>StarCoder2 (3B - 15B)</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate starcoder2
congrats! |
Motivation
Support cogvlm-chat-hf and CogVLM2 for pytorch engine
Usage:
Warning
CogVLM-Chat-hf uses
'lmsys/vicuna-7b-v1.5'
as tokenizer, you need to copy the tokenizer model and configs into CogVLM model directory.Modification
TODOs
ModelInputs.split
with vision embeddingsBC-breaking (Optional)
Profiling
tp=1 batch_size=128 num-prompts=3000
cogvlm-chat-hf
Without images
with one image
change
profile_throught.py
and prepend 1234 tokens and image embeddings to each promptREST API
using PR #1662
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist