vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.4k

Code
Issues 816
Pull requests 226
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 23

v0.4.2 Release Tracker

#4505 by simon-mo was closed May 5, 2024

Closed 12

Virtual Office Hours: May 15 2pm ET

#4538 opened May 1, 2024 by robertgshaw2-neuralmagic

Open 1

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

816 Open 1,899 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: ModelRegistry.load_model_cls() circular import error on llama-llava bug

Something isn't working

#4807 opened May 14, 2024 by datta-nimmaturi

[Performance]: Qwen 7b chat model, under 128 concurrency, the CPU utilization rate is 100%, and the GPU SM utilization rate is only about 60%-75%. Is it a CPU bottleneck? performance

Performance-related issues

#4806 opened May 14, 2024 by markluofd

[Usage]: Seems nn.module definition may affect the output tokens. Don't know the reason. usage

How to use vllm

#4805 opened May 14, 2024 by Zhenzhong1

[Performance]: how to test tensorrt-llm serving correctly performance

Performance-related issues

#4803 opened May 14, 2024 by RunningLeon

[Performance]: Deepseek-v2 support performance

Performance-related issues

#4802 opened May 14, 2024 by ZixinxinWang

[Bug]: logprobs is not compatible with the OpenAI spec bug

Something isn't working

#4795 opened May 13, 2024 by GabrielBianconi

[Bug]: Async engine hangs with 0.4.* releases bug

Something isn't working

#4789 opened May 13, 2024 by glos-nv

[Bug]: RAM OOM Error Loading 480GB MoE Model Despite Fix in PR #1395 bug

Something isn't working

#4786 opened May 13, 2024 by hxer7963

[Bug]: multi-gpu for baichuan2-13B-Chat benchmark_serving bug

Something isn't working

#4785 opened May 13, 2024 by shudct

[Bug]: deploy Phi-3-mini-128k-instruct AssertionError bug

Something isn't working

#4784 opened May 13, 2024 by hxujal

[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput usage

How to use vllm

#4783 opened May 13, 2024 by Ourspolaire1

[Doc]: Doc for using tensorizer_uri with LLM is incorrect documentation

Improvements or additions to documentation

#4782 opened May 13, 2024 by GRcharles

[Feature]: Support the OpenAI Batch Chat Completions file format feature request

#4777 opened May 13, 2024 by wuisawesome

[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt bug

Something isn't working

#4772 opened May 12, 2024 by leejamesss

[Feature]: Host CPU Docker image on Docker Hub feature request

#4771 opened May 12, 2024 by VMinB12

[Feature]: CI: Test on NVLink-enabled machine feature request

#4770 opened May 12, 2024 by youkaichao

[Feature]: could paged_attention_v1 support parameter 'attn_bias' feature request

#4766 opened May 11, 2024 by cillinzhang

[Feature]: Support W4A8KV4 Quantization(QServe/QoQ) feature request

#4763 opened May 11, 2024 by bratao

[Performance]: Why the avg. througput generation is low? performance

Performance-related issues

#4760 opened May 11, 2024 by rvsh2

[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8 bug

Something isn't working

#4756 opened May 11, 2024 by sfc-gh-zhwang

Regression in support of customized "role" in OpenAI compatible API (v.0.4.2) good first issue

Good for newcomers

#4755 opened May 10, 2024 by simon-mo

[Usage]: prompt_logprompt from endpoint usage

How to use vllm

#4747 opened May 10, 2024 by basma-b

[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU usage

How to use vllm

#4744 opened May 10, 2024 by danielstankw

[RFC]: Support specifying quant_config details in the LLM or Server entrypoints feature request RFC

#4743 opened May 10, 2024 by mgoin

[Bug]: ValueError when using LoRA with CohereForCausalLM model bug

Something isn't working

#4742 opened May 10, 2024 by onlyfish79

Previous 1 2 3 4 5 … 32 33 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-05-11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly