llama3 on prem benchmarks #490

ifelsefi · 2024-05-08T14:45:01Z

🚀 The feature, motivation and pitch

Documentation and examples are for llama2 benchmarks. We would like to run llama3 on prem benchmarks.

Alternatives

No response

Additional context

No response

wukaixingxp · 2024-05-09T17:34:52Z

Hi! I will update the code soon, meanwhile you can change the MODEL_PATH to meta-llama/Meta-Llama-3-70B, then launch a vllm server that host meta-llama/Meta-Llama-3-70B-Instruct by CUDA_VISIBLE_DEVICES=0,1,2,3 python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B-Instruct --tensor-parallel-size 4 --disable-log-requests --port 8000 . You can still use the chat_vllm_benchmark.py to benchmark.

wukaixingxp · 2024-05-09T18:22:53Z

PR merged, please try the latest example. Let me know if there is any problem.

wukaixingxp · 2024-05-14T17:34:02Z

Closing this issue as the PR has been merged. Let me know if there is any problem!

wukaixingxp self-assigned this May 9, 2024

wukaixingxp mentioned this issue May 9, 2024

changed readme.md and parameters.json to support llama3 vllm benchmark #492

Merged

6 tasks

wukaixingxp closed this as completed May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3 on prem benchmarks #490

llama3 on prem benchmarks #490

ifelsefi commented May 8, 2024

wukaixingxp commented May 9, 2024 •

edited

wukaixingxp commented May 9, 2024

wukaixingxp commented May 14, 2024

llama3 on prem benchmarks #490

llama3 on prem benchmarks #490

Comments

ifelsefi commented May 8, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

wukaixingxp commented May 9, 2024 • edited

wukaixingxp commented May 9, 2024

wukaixingxp commented May 14, 2024

wukaixingxp commented May 9, 2024 •

edited