Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After finetuning llava, can I run it without xtuner? #382

Closed
StarCycle opened this issue Jan 31, 2024 · 1 comment
Closed

After finetuning llava, can I run it without xtuner? #382

StarCycle opened this issue Jan 31, 2024 · 1 comment

Comments

@StarCycle
Copy link

StarCycle commented Jan 31, 2024

I have finished the finetuning of llava and done the benchmark. And I got files like xtuner/llava-internlm2-7b in HuggingFace.

How to run the model without xtuner (i.e., without using xtuner chat)? For example, Qwen-VL can be loaded by Huggingface Transformers only:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cpu", trust_remote_code=True).eval()
# use cuda device
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cuda", trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

# 1st dialogue turn
query = tokenizer.from_list_format([
    {'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
    {'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
# 图中是一名女子在沙滩上和狗玩耍,旁边是一只拉布拉多犬,它们处于沙滩上。

# 2nd dialogue turn
response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)
# <ref>击掌</ref><box>(536,509),(588,602)</box>
image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
  image.save('1.jpg')
else:
  print("no box")
@StarCycle
Copy link
Author

Here is the answer from @pppppM:

  1. LLaVA finetuned by xtuner cannot be loaded like Qwen-VL. Developers of Qwen-VL also put a model file (modeling_qwen.py) in their huggingface repo so they can load their model in this way. However, it also means Qwen-VL only allows a fixed model architecture. By contrast, LLaVA finetuned by xtuner allows different model architectures, like CLIP+Vicuna, CLIP+internlm, CLIP+internlm2, Dinov2+internlm, etc.

  2. LLaVA finetuned by xtuner can be deployed in another way (pull [Refactor & Feature] Refactor xtuner chat to support lmdeploy &vLLM #317, still under development). You can deploy the LLaVA model with huggingface llava chatbot (based on Huggingface transformers) or lmdeplot llava chatbot (based on LMDeploy Turbomind). The two chatbots share the same interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant