After finetuning llava, can I run it without xtuner? #382

StarCycle · 2024-01-31T12:45:46Z

I have finished the finetuning of llava and done the benchmark. And I got files like xtuner/llava-internlm2-7b in HuggingFace.

How to run the model without xtuner (i.e., without using xtuner chat)? For example, Qwen-VL can be loaded by Huggingface Transformers only:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
torch.manual_seed(1234)

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cpu", trust_remote_code=True).eval()
# use cuda device
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="cuda", trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)

# 1st dialogue turn
query = tokenizer.from_list_format([
    {'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, # Either a local path or an url
    {'text': '这是什么?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
# 图中是一名女子在沙滩上和狗玩耍，旁边是一只拉布拉多犬，它们处于沙滩上。

# 2nd dialogue turn
response, history = model.chat(tokenizer, '框出图中击掌的位置', history=history)
print(response)
# <ref>击掌</ref><box>(536,509),(588,602)</box>
image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
  image.save('1.jpg')
else:
  print("no box")

The text was updated successfully, but these errors were encountered:

StarCycle · 2024-01-31T13:21:26Z

Here is the answer from @pppppM:

LLaVA finetuned by xtuner cannot be loaded like Qwen-VL. Developers of Qwen-VL also put a model file (modeling_qwen.py) in their huggingface repo so they can load their model in this way. However, it also means Qwen-VL only allows a fixed model architecture. By contrast, LLaVA finetuned by xtuner allows different model architectures, like CLIP+Vicuna, CLIP+internlm, CLIP+internlm2, Dinov2+internlm, etc.
LLaVA finetuned by xtuner can be deployed in another way (pull [Refactor & Feature] Refactor xtuner chat to support lmdeploy &vLLM #317, still under development). You can deploy the LLaVA model with huggingface llava chatbot (based on Huggingface transformers) or lmdeplot llava chatbot (based on LMDeploy Turbomind). The two chatbots share the same interface.

StarCycle closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After finetuning llava, can I run it without xtuner? #382

After finetuning llava, can I run it without xtuner? #382

StarCycle commented Jan 31, 2024 •

edited

StarCycle commented Jan 31, 2024

After finetuning llava, can I run it without xtuner? #382

After finetuning llava, can I run it without xtuner? #382

Comments

StarCycle commented Jan 31, 2024 • edited

StarCycle commented Jan 31, 2024

StarCycle commented Jan 31, 2024 •

edited