Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Implement COG-VLM2 #1622

Open
isidentical opened this issue May 20, 2024 · 16 comments
Open

[Feature] Implement COG-VLM2 #1622

isidentical opened this issue May 20, 2024 · 16 comments
Assignees

Comments

@isidentical
Copy link
Contributor

Motivation

CogVLM2 is now the SOTA open source VLM for captioning tasks.

Related resources

No response

Additional context

No response

@RunningLeon
Copy link
Collaborator

@isidentical hi, thanks for your information. We will include cogvlm2 after pr #1502 is merged.

@Jayantverma2
Copy link

any update?

@RunningLeon
Copy link
Collaborator

any update?

hi, it's in progress. Any update will sync to this issue.

@RunningLeon
Copy link
Collaborator

@isidentical @Jayantverma2 hi, guys. CogVLM2 models are supported in PR #1502. If you have time, have a try. Welcome to leave any comments in the PR. THX.

@Tushar-ml
Copy link

@RunningLeon Is this the correct way to initialize the cogvlm2?

engine = pipeline(model_path, "cogvlm2",log_level="DEBUG")
I have made some changes to config.json

{
"architectures": [
"CogVLMForCausalLM"
],
"auto_map": {
"AutoConfig": "configuration_cogvlm.CogVLMConfig",
"AutoModelForCausalLM": "modeling_cogvlm.CogVLMForCausalLM"
},
"vision_config": {
"dropout_prob": 0.0,
"hidden_act": "gelu",
"in_channels": 3,
"num_hidden_layers": 63,
"hidden_size": 1792,
"patch_size": 14,
"num_heads": 16,
"intermediate_size": 15360,
"layer_norm_eps": 1e-06,
"num_positions": 9217,
"image_size": 1344
},
"hidden_size": 4096,
"intermediate_size": 14336,
"num_attention_heads": 32,
"max_position_embeddings": 8192,
"rms_norm_eps": 1e-05,
"template_version": "chat",
"initializer_range": 0.02,
"bos_token_id": 128000,
"eos_token_id": [128001, 128009],
"pad_token_id": 128002,
"vocab_size": 128256,
"num_hidden_layers": 32,
"hidden_act": "silu",
"use_cache": true,
"transformers_version": "4.41.0"
}

But when I am running this with this prompt
prompts = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': prompt}, {'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image}'}} ] } ]
it is generating b''

@RunningLeon
Copy link
Collaborator

RunningLeon commented May 29, 2024

@Tushar-ml hi, pls. follow examples in the document: https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#vlm-offline-inference-pipeline.

prompts should be like

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'describe this image'},
            {'type': 'image_url', 'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'}}
        ]
    }
]

@Tushar-ml
Copy link

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

@pseudotensor
Copy link

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@RunningLeon
Copy link
Collaborator

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

@Tushar-ml hi, no need to do so for cogvlm2, but should do for cogvlm(1).

@RunningLeon
Copy link
Collaborator

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

@pseudotensor
Copy link

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

Yes, will gladly do that.

@Tushar-ml
Copy link

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

@RunningLeon
Copy link
Collaborator

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

@Tushar-ml hi, could you provide your sample code? Normally, you can reudce cache_max_entry_count to reduce kv mem size and reduce max_prefill_token_num from PytorchEngineConfig

cache_max_entry_count: float = 0.8
eviction_type: str = 'recompute'
prefill_interval: int = 16
block_size: int = 64
num_cpu_blocks: int = 0
num_gpu_blocks: int = 0
adapters: Dict[str, str] = None
max_prefill_token_num: int = 4096
thread_safe: bool = False
enable_prefix_caching: bool = False
download_dir: str = None
revision: str = None
def __post_init__(self):
"""Check input validation."""
assert self.tp >= 1, 'invalid tp'
assert self.max_batch_size >= 1, 'invalid max_batch_size'
assert self.cache_max_entry_count > 0 and self.cache_max_entry_count < 1, 'invalid cache_max_entry_count' # noqa
assert self.eviction_type in ('recompute',
'copy'), 'invalid eviction_type'
assert self.num_cpu_blocks >= 0, 'invalid num_cpu_blocks'
assert self.max_prefill_token_num >= 0, 'invalid max_prefill_token_num'
assert self.num_gpu_blocks >= 0, 'invalid num_gpu_blocks'
class ResponseType(enum.Enum):
"""Response type."""
SUCCESS = enum.auto()

@Tushar-ml
Copy link

Thanks @RunningLeon I will try this

@GuoXu-booo
Copy link

@RunningLeon Hi!
Due to server network limitations, I could not compile and install the latest lmdeploy on the server, so I downloaded an image of lmdeploy0.4.2 on docker hub and ran it, then ran cogvlm2 and reported an error:

root@gpu9:~/data/CogVLM2# python cogvlm_demo.py
2024-05-31 01:31:08,920 - lmdeploy - ERROR - TypeError: expected string or bytes-like object
2024-05-31 01:31:08,920 - lmdeploy - ERROR - test failed!
model /root/data/cogvlm2-llama3-chinese-chat-19B/ requires transformers version None but transformers 4.40.2 is installed.

my code:
from lmdeploy import pipeline
from lmdeploy.vl import load_image

model_path = '/root/data/cogvlm2-llama3-chinese-chat-19B/'

pipe = pipeline(model_path)

image = load_image('/root/data/dataset/misumi_data/images/Misumi000006.jpg')
response = pipe(('图中出现的零件是什么?', image))
print(response)

I look forward to your reply. Thank you

@RunningLeon
Copy link
Collaborator

@GuoXu-booo hi, because cogvlm is supported in pytorch engine and can you simply clone the code from pr and run pip install -e to install it. BTW, you better use the latest code from PR #1502. The check env part fails in your case as there's no transformers_version in the config.json, which is fixed in here

git clone --recursive -b support-cogvlm-dev https://github.com/RunningLeon/lmdeploy.git
cd lmdeploy 
pip install -e .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants