Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: ModelRegistry.load_model_cls() circular import error on llama-llava
bug
Something isn't working
#4807
opened May 14, 2024 by
datta-nimmaturi
[Performance]: Qwen 7b chat model, under 128 concurrency, the CPU utilization rate is 100%, and the GPU SM utilization rate is only about 60%-75%. Is it a CPU bottleneck?
performance
Performance-related issues
#4806
opened May 14, 2024 by
markluofd
[Usage]: Seems nn.module definition may affect the output tokens. Don't know the reason.
usage
How to use vllm
#4805
opened May 14, 2024 by
Zhenzhong1
[Performance]: how to test tensorrt-llm serving correctly
performance
Performance-related issues
#4803
opened May 14, 2024 by
RunningLeon
[Performance]: Deepseek-v2 support
performance
Performance-related issues
#4802
opened May 14, 2024 by
ZixinxinWang
[Bug]: Something isn't working
logprobs
is not compatible with the OpenAI spec
bug
#4795
opened May 13, 2024 by
GabrielBianconi
[Bug]: Async engine hangs with 0.4.* releases
bug
Something isn't working
#4789
opened May 13, 2024 by
glos-nv
[Bug]: RAM OOM Error Loading 480GB MoE Model Despite Fix in PR #1395
bug
Something isn't working
#4786
opened May 13, 2024 by
hxer7963
[Bug]: multi-gpu for baichuan2-13B-Chat benchmark_serving
bug
Something isn't working
#4785
opened May 13, 2024 by
shudct
[Bug]: deploy Phi-3-mini-128k-instruct AssertionError
bug
Something isn't working
#4784
opened May 13, 2024 by
hxujal
[Usage]: How to change the batch size when testing the throughput of VLLM by running benchmark_throughput
usage
How to use vllm
#4783
opened May 13, 2024 by
Ourspolaire1
[Doc]: Doc for using tensorizer_uri with LLM is incorrect
documentation
Improvements or additions to documentation
#4782
opened May 13, 2024 by
GRcharles
[Feature]: Support the OpenAI Batch Chat Completions file format
feature request
#4777
opened May 13, 2024 by
wuisawesome
[Bug]: Unexpected Special Tokens in prompt_logprobs Output for Llama3 Prompt
bug
Something isn't working
#4772
opened May 12, 2024 by
leejamesss
[Feature]: CI: Test on NVLink-enabled machine
feature request
#4770
opened May 12, 2024 by
youkaichao
[Feature]: could paged_attention_v1 support parameter 'attn_bias'
feature request
#4766
opened May 11, 2024 by
cillinzhang
[Feature]: Support W4A8KV4 Quantization(QServe/QoQ)
feature request
#4763
opened May 11, 2024 by
bratao
[Performance]: Why the avg. througput generation is low?
performance
Performance-related issues
#4760
opened May 11, 2024 by
rvsh2
[Bug]: CUDA error when running mistral-7b + lora with tensor_para=8
bug
Something isn't working
#4756
opened May 11, 2024 by
sfc-gh-zhwang
Regression in support of customized "role" in OpenAI compatible API (v.0.4.2)
good first issue
Good for newcomers
#4755
opened May 10, 2024 by
simon-mo
[Usage]: Vllm AutoAWQ with 4-GPU doesnt utilize GPU
usage
How to use vllm
#4744
opened May 10, 2024 by
danielstankw
[RFC]: Support specifying quant_config details in the LLM or Server entrypoints
feature request
RFC
#4743
opened May 10, 2024 by
mgoin
[Bug]: ValueError when using LoRA with CohereForCausalLM model
bug
Something isn't working
#4742
opened May 10, 2024 by
onlyfish79
Previous Next
ProTip!
Updated in the last three days: updated:>2024-05-11.