Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Add command cal-model-mem #1460

Merged
merged 2 commits into from
May 21, 2024

Conversation

frostyplanet
Copy link
Contributor

Implement model.llm.memory.estimate_llm_gpu_memory

which output model_mem, kv_cache, overhead, active_mem.

  • Download config.json from huggingface/modelscope and load model layers info

  • support kv_cache_dtype 8/16/32 (gpu_poor might only calculate fp32)

Algorithm refer to https://github.com/RahulSChand/gpu_poor

model.llm.utils: Add convert_model_size_to_float

Usage:

$ env HF_ENDPOINT=https://hf-mirror.com xinference cal-model-mem -s 7 -q Int4 -f gptq -c 16384 -n qwen1.5-chat
model_name: qwen1.5-chat
kv_cache_dtype: 16
model size: 7.0 B
quant: Int4
context: 16384
gpu mem usage:
  model mem: 4139 MB
  kv_cache: 8192 MB
  overhead: 650 MB
  active: 17024 MB
  total: 30005 MB (30 GB)

$ env HF_ENDPOINT=https://hf-mirror.com xinference cal-model-mem -s 1_8 -q Int4 -f gptq -c 32768 -n qwen1.5-chat
model_name: qwen1.5-chat
kv_cache_dtype: 16
model size: 1.8 B
quant: Int4
context: 32768
gpu mem usage:
  model mem: 1065 MB
  kv_cache: 6144 MB
  overhead: 650 MB
  active: 33408 MB
  total: 41267 MB (41 GB)


@XprobeBot XprobeBot added the gpu label May 9, 2024
@XprobeBot XprobeBot added this to the v0.11.0 milestone May 9, 2024
@frostyplanet frostyplanet force-pushed the feat/model_mem_backport branch 3 times, most recently from f33d2b4 to 20575db Compare May 10, 2024 05:36
@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1 May 11, 2024
@qinxuye qinxuye changed the title Add command cal-model-mem FEAT: Add command cal-model-mem May 13, 2024
Implement model.llm.memory.estimate_llm_gpu_memory

which output model_mem, kv_cache, overhead, active_mem.

* Download config.json from huggingface/modelscope and load model layers info

* support kv_cache_dtype 8/16/32 (gpu_poor might only calculate fp32)

Algorithm refer to https://github.com/RahulSChand/gpu_poor

model.llm.utils: Add convert_model_size_to_float
@XprobeBot XprobeBot modified the milestones: v0.11.1, v0.11.2 May 17, 2024
Copy link
Contributor

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 8464b41 into xorbitsai:main May 21, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants