Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does vlmeval support multi card inference and batch size > 1? #32

Open
John-Ge opened this issue Dec 28, 2023 · 7 comments
Open

Does vlmeval support multi card inference and batch size > 1? #32

John-Ge opened this issue Dec 28, 2023 · 7 comments

Comments

@John-Ge
Copy link

John-Ge commented Dec 28, 2023

Does vlmeval support multi card inference and batch size > 1?

@kennymckormick
Copy link
Member

Hi, @John-Ge ,

  1. For simplicity reasons, VLMEvalKit do not support batch size > 1 inference for now.
  2. VLMEvalKit currently supports two types of multi-GPU inference:
    1). DistributedDataParallel via torchrun, which run N VLM instances on N GPUs. It requires your VLM to be small enough and can run on a single GPU.
    2). The model is configured by default to use multiple GPUs (like IDEFICS_80B_INSTRUCT). When you launch with python, it will automatically run on all available GPUs.

@John-Ge
Copy link
Author

John-Ge commented Dec 29, 2023

Thanks for your relpy!
I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or tgi?
Do we need to wait for them to support llava?

@darkpromise98
Copy link

The authors of LLaVA have tried to create the beta-version of batch inference https://github.com/haotian-liu/LLaVA/issues/754

@kennymckormick
Copy link
Member

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

@darkpromise98
Copy link

Hi, @darkpromise98 , we will try to include this feature into VLMEvalKit recently.

That's great !

@John-Ge
Copy link
Author

John-Ge commented Jan 25, 2024

haotian-liu/LLaVA#754 (comment)
this issue build a fast inference method for llava, would you add this function for every benchmark in this repo?

BTW, I find sglang may not support lora+base model. I train llava with lora. If possible, I hope you could support load base model and merge lora weights and deploy it for evaluation.

@kennymckormick
Copy link
Member

kennymckormick commented Jan 27, 2024

Hi, @John-Ge @darkpromise98 , I have reviewed the request. I'm sorry that I may not implement this feature on my own for the following reasons:

  1. Currently, only few VLMs supports the batch_inference interface, adding it for LLaVA may lead to some major changes in the inference pipeline of VLMEvalKit.
  2. The inference of LLaVA is relatively fast: under batch_size=1, llava-v1.5-13b can run at 3~4 fps on a single A100. Thus I think batch_inference for LLaVA may not be a critical feature for VLMEvalKit.

BTW, I'm willing to review and merge it VLMEvalKit main branch if someone is willing to create a PR (might be relatively heavy) about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants