Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor & Feature] Refactor xtuner chat to support lmdeploy &vLLM #317

Draft
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

pppppM
Copy link
Collaborator

@pppppM pppppM commented Jan 15, 2024

Motivation

  • 可以对接推理引擎加速 xtuner chat
  • 支持 xtuner 训练得到的模型直接部署
  • 方便 xtuner 开发 gradio 应用
  • 保证训练部署时对话模板一致
  • 简化部署流程

Usage

  1. xtuner chat 启动命令
# HF 
python xtuner/tools/new_chat.py internlm/internlm-chat-7b 

# LMDeploy (w/o adapter)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b --lmdeploy

# LMDeploy (w/o adapter)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b --vllm

# HF Moss
python xtuner/tools/new_chat.py meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --system-prompt moss_sft --with-plugins calculate solve search 

# LMDeploy Moss (w/o adapter)
python xtuner/tools/new_chat.py MOSS_MERGED --bot-name Llama2 --prompt-template moss_sft --system-prompt moss_sft --with-plugins calculate solve search  --lmdeploy

# Lagent (only support HF)
python xtuner/tools/new_chat.py internlm/internlm-7b --adapter xtuner/internlm-7b-qlora-msagent-react --lagent

# Llava (only support HF)
python xtuner/tools/new_chat.py internlm/internlm-chat-7b \
  --visual-encoder openai/clip-vit-large-patch14-336 \
  --llava xtuner/llava-internlm-7b \
  --prompt-template internlm_chat \
  --image $IMAGE_PATH
  1. ChatBot 用法

from xtuner.chat import BaseChat, CHAT_TEMPLATE
template = CHAT_TEMPLATE['internlm2-chat']

################# 使用 HF 推理 #####################
from xtuner.chat import HFBot
bot = HFBot('internlm/internlm2-chat-7b')
hf_bot = BaseChat( bot, chat_template=template)

## 对话
print(hf_bot.chat( '你是谁'))

## 流式输出
streamer = hf_bot.create_streamer()
hf_bot.chat( '你是谁',  streamer=streamer)

## 流式输出迭代器(for gradio)
streamer = hf_bot.create_streamer(iterable=True)

from threading import Thread
chat_kwargs = dict(text='你是谁', streamer=streamer)
thread = Thread(target=hf_bot.chat, kwargs=chat_kwargs)
thread.start()

for new_text in streamer:
      print(new_text, flush=True, end='')

## 清空历史
hf_bot.reset_history()

## 离线批处理
results = hf_bot.predict(['你是谁?', '你叫什么?'])


################# 使用 HF Llava 推理 #####################
from xtuner.chat import HFLlavaBot, LlavaChat
bot = HFLlavaBot(
                 'internlm/internlm2-chat-7b', 
                 'xtuner/llava-internlm2-7b',
                 'openai/clip-vit-large-patch14-336')

image1 = 'https://llava.hliu.cc/file=/nobackup/haotian/code/LLaVA_dev/llava/serve/examples/extreme_ironing.jpg'
image2 = 'https://llava.hliu.cc/file=/nobackup/haotian/code/LLaVA_dev/llava/serve/examples/waterview.jpg'
llava_bot = LlavaChat( bot, image1, chat_template=template)

## 对话
print(llava_bot.chat( 'What is unusual about this image?'))

## 流式输出
streamer = bot.create_streamer()
llava_bot.chat( 'What is unusual about this image?',  streamer=streamer)

## 流式输出迭代器(for gradio)
streamer = bot.create_streamer(iterable=True)

from threading import Thread
chat_kwargs = dict(text='What is unusual about this image?', streamer=streamer)
thread = Thread(target=llava_bot.chat, kwargs=chat_kwargs)
thread.start()

for new_text in streamer:
      print(new_text, flush=True, end='')

## 清空历史
llava_bot.reset_history()

## 替换图像
llava_bot.reset_image(img2)
print(llava_bot.chat( 'What are the things I should be cautious about when I visit here?'))


TODO

  • Test HF Chat
  • Test LMDeploy Chat
  • Test vLLM Chat
  • Test HF Predict
  • Test LMDeploy Predict
  • Test vLLM Predict
  • Test HF Moss Chat
  • Test LMDeploy Moss Chat (w/o adapter)
  • Test HF Lagent Chat
  • Test HF Llava Chat

New Args

  1. repetition-penalty
  2. lmdeploy(LMDeploy)
  3. dynamic-ntk(LMDeploy)
  4. logn-attn(LMDeploy)
  5. rope_scaling_factor(LMDeploy)
  6. batch-size(LMDeploy)
  7. predict, the file path that need to be predicted offline
  8. predict-repeat

BC-Breakings

  1. Remove torch-dtype
  2. Remove offload-folder
  3. Remove no-streamer(only support no-streamer)

@pppppM pppppM marked this pull request as draft January 15, 2024 07:08
@pppppM pppppM changed the title [Refactor & Feature] Refactor xtuner chat to support lmdeploy [Refactor & Feature] Refactor xtuner chat to support lmdeploy &vLLM Jan 16, 2024
@chynphh
Copy link

chynphh commented Mar 14, 2024

@pppppM 目前可以用了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants