Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3 on prem benchmarks #490

Closed
ifelsefi opened this issue May 8, 2024 · 3 comments
Closed

llama3 on prem benchmarks #490

ifelsefi opened this issue May 8, 2024 · 3 comments
Assignees

Comments

@ifelsefi
Copy link

ifelsefi commented May 8, 2024

馃殌 The feature, motivation and pitch

Documentation and examples are for llama2 benchmarks. We would like to run llama3 on prem benchmarks.

Alternatives

No response

Additional context

No response

@wukaixingxp
Copy link
Contributor

wukaixingxp commented May 9, 2024

Hi! I will update the code soon, meanwhile you can change the MODEL_PATH to meta-llama/Meta-Llama-3-70B, then launch a vllm server that host meta-llama/Meta-Llama-3-70B-Instruct by CUDA_VISIBLE_DEVICES=0,1,2,3 python -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3-70B-Instruct --tensor-parallel-size 4 --disable-log-requests --port 8000 . You can still use the chat_vllm_benchmark.py to benchmark.

@wukaixingxp
Copy link
Contributor

PR merged, please try the latest example. Let me know if there is any problem.

@wukaixingxp
Copy link
Contributor

Closing this issue as the PR has been merged. Let me know if there is any problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants