FEAT: Add command cal-model-mem #1460

frostyplanet · 2024-05-09T08:28:57Z

Implement model.llm.memory.estimate_llm_gpu_memory

which output model_mem, kv_cache, overhead, active_mem.

Download config.json from huggingface/modelscope and load model layers info
support kv_cache_dtype 8/16/32 (gpu_poor might only calculate fp32)

Algorithm refer to https://github.com/RahulSChand/gpu_poor

model.llm.utils: Add convert_model_size_to_float

Usage:

$ env HF_ENDPOINT=https://hf-mirror.com xinference cal-model-mem -s 7 -q Int4 -f gptq -c 16384 -n qwen1.5-chat
model_name: qwen1.5-chat
kv_cache_dtype: 16
model size: 7.0 B
quant: Int4
context: 16384
gpu mem usage:
  model mem: 4139 MB
  kv_cache: 8192 MB
  overhead: 650 MB
  active: 17024 MB
  total: 30005 MB (30 GB)

$ env HF_ENDPOINT=https://hf-mirror.com xinference cal-model-mem -s 1_8 -q Int4 -f gptq -c 32768 -n qwen1.5-chat
model_name: qwen1.5-chat
kv_cache_dtype: 16
model size: 1.8 B
quant: Int4
context: 32768
gpu mem usage:
  model mem: 1065 MB
  kv_cache: 6144 MB
  overhead: 650 MB
  active: 33408 MB
  total: 41267 MB (41 GB)

xinference/model/llm/llm_family.py

Implement model.llm.memory.estimate_llm_gpu_memory which output model_mem, kv_cache, overhead, active_mem. * Download config.json from huggingface/modelscope and load model layers info * support kv_cache_dtype 8/16/32 (gpu_poor might only calculate fp32) Algorithm refer to https://github.com/RahulSChand/gpu_poor model.llm.utils: Add convert_model_size_to_float

xinference/model/llm/llm_family.py

qinxuye

LGTM

XprobeBot added the gpu label May 9, 2024

XprobeBot added this to the v0.11.0 milestone May 9, 2024

frostyplanet force-pushed the feat/model_mem_backport branch 3 times, most recently from f33d2b4 to 20575db Compare May 10, 2024 05:36

XprobeBot modified the milestones: v0.11.0, v0.11.1 May 11, 2024

qinxuye reviewed May 13, 2024

View reviewed changes

xinference/model/llm/llm_family.py Show resolved Hide resolved

qinxuye changed the title ~~Add command cal-model-mem~~ FEAT: Add command cal-model-mem May 13, 2024

XprobeBot added the feature label May 13, 2024

frostyplanet added 2 commits May 16, 2024 13:01

Update .gitignore for vim

34c7a3b

frostyplanet force-pushed the feat/model_mem_backport branch from 20575db to 29e01f6 Compare May 16, 2024 05:02

XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024

qinxuye reviewed May 20, 2024

View reviewed changes

xinference/model/llm/llm_family.py Show resolved Hide resolved

qinxuye approved these changes May 21, 2024

View reviewed changes

qinxuye merged commit 8464b41 into xorbitsai:main May 21, 2024
10 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add command cal-model-mem #1460

FEAT: Add command cal-model-mem #1460

frostyplanet commented May 9, 2024

qinxuye left a comment

FEAT: Add command cal-model-mem #1460

FEAT: Add command cal-model-mem #1460

Conversation

frostyplanet commented May 9, 2024

qinxuye left a comment

Choose a reason for hiding this comment