Does vlmeval support multi card inference and batch size > 1? #32

John-Ge · 2023-12-28T11:40:49Z

Does vlmeval support multi card inference and batch size > 1?

kennymckormick · 2023-12-29T08:16:37Z

For simplicity reasons, VLMEvalKit do not support batch size > 1 inference for now.
VLMEvalKit currently supports two types of multi-GPU inference:
1). DistributedDataParallel via torchrun, which run N VLM instances on N GPUs. It requires your VLM to be small enough and can run on a single GPU.
2). The model is configured by default to use multiple GPUs (like IDEFICS_80B_INSTRUCT). When you launch with python, it will automatically run on all available GPUs.

John-Ge · 2023-12-29T10:52:09Z

Thanks for your relpy!
I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or tgi?
Do we need to wait for them to support llava?

darkpromise98 · 2024-01-19T12:49:24Z

The authors of LLaVA have tried to create the beta-version of batch inference https://github.com/haotian-liu/LLaVA/issues/754

kennymckormick · 2024-01-20T09:08:10Z

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

darkpromise98 · 2024-01-21T02:37:03Z

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

That's great !

John-Ge · 2024-01-25T13:17:47Z

haotian-liu/LLaVA#754 (comment)
this issue build a fast inference method for llava, would you add this function for every benchmark in this repo?

BTW, I find sglang may not support lora+base model. I train llava with lora. If possible, I hope you could support load base model and merge lora weights and deploy it for evaluation.

kennymckormick · 2024-01-27T12:59:31Z

Hi, @John-Ge @darkpromise98 , I have reviewed the request. I'm sorry that I may not implement this feature on my own for the following reasons:

Currently, only few VLMs supports the batch_inference interface, adding it for LLaVA may lead to some major changes in the inference pipeline of VLMEvalKit.
The inference of LLaVA is relatively fast: under batch_size=1, llava-v1.5-13b can run at 3~4 fps on a single A100. Thus I think batch_inference for LLaVA may not be a critical feature for VLMEvalKit.

BTW, I'm willing to review and merge it VLMEvalKit main branch if someone is willing to create a PR (might be relatively heavy) about it.

kennymckormick added the Feature Request label Jan 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does vlmeval support multi card inference and batch size > 1? #32

Does vlmeval support multi card inference and batch size > 1? #32

John-Ge commented Dec 28, 2023

kennymckormick commented Dec 29, 2023

John-Ge commented Dec 29, 2023

darkpromise98 commented Jan 19, 2024

kennymckormick commented Jan 20, 2024

darkpromise98 commented Jan 21, 2024

John-Ge commented Jan 25, 2024 •

edited

kennymckormick commented Jan 27, 2024 •

edited

Does vlmeval support multi card inference and batch size > 1? #32

Does vlmeval support multi card inference and batch size > 1? #32

Comments

John-Ge commented Dec 28, 2023

kennymckormick commented Dec 29, 2023

John-Ge commented Dec 29, 2023

darkpromise98 commented Jan 19, 2024

kennymckormick commented Jan 20, 2024

darkpromise98 commented Jan 21, 2024

John-Ge commented Jan 25, 2024 • edited

kennymckormick commented Jan 27, 2024 • edited

John-Ge commented Jan 25, 2024 •

edited

kennymckormick commented Jan 27, 2024 •

edited