Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Can't use offline inference embedding
bug
Something isn't working
#4908
opened May 19, 2024 by
Fanb1ing
[Bug]: Cannot use FlashAttention-2 backend because the flash_attn package is not found
bug
Something isn't working
#4906
opened May 19, 2024 by
maxin9966
[Bug]: llm_engine_example.py (more requests) get stuck
bug
Something isn't working
#4904
opened May 19, 2024 by
CsRic
[Feature]: Support for Falcon-11B model (Falcon 2)
feature request
#4902
opened May 18, 2024 by
s-smits
[Usage]: Profiling Prefill and Decode Phases Separately
usage
How to use vllm
#4900
opened May 18, 2024 by
Msiavashi
[Usage]: Passing a guided_json in offline inference
usage
How to use vllm
#4899
opened May 18, 2024 by
ccdv-ai
[Bug]: Something isn't working
CohereForAI/c4ai-command-r-v01
OSError: [Errno 12] Cannot allocate memory
bug
#4891
opened May 17, 2024 by
epignatelli
[Bug]: assert parts[0] == "base_model" AssertionError
bug
Something isn't working
#4883
opened May 17, 2024 by
Edisonwei54
[Usage]: why can't I set gpu nums while use "tensor_parallel_size"?
usage
How to use vllm
#4882
opened May 17, 2024 by
GodHforever
[Installation]: Do we have the plan to update the pip package installation method for the CPU backend.
installation
Installation problems
#4881
opened May 17, 2024 by
Zhenzhong1
[Usage]: gpu memory usage when using tensor parallel
usage
How to use vllm
#4880
opened May 17, 2024 by
DaiJianghai
[Bug]: single lora request error make all processing requests error
bug
Something isn't working
#4879
opened May 17, 2024 by
jinzhen-lin
[Bug]: Shape error encountered in speculative decoding when Something isn't working
enable_lora=True
bug
#4872
opened May 17, 2024 by
mitchellstern
[Feature]: Health check for restart policy
feature request
#4867
opened May 16, 2024 by
pseudotensor
[Usage]: distributed inference with kuberay
usage
How to use vllm
#4865
opened May 16, 2024 by
hetian127
[Misc]: a question about chunked-prefill in flash-attn backends
misc
#4863
opened May 16, 2024 by
HarryWu99
[Bug]: No CUDA GPUs are available on 'CPU' use
bug
Something isn't working
#4858
opened May 16, 2024 by
mcr-ksh
[Usage]: How to determine how many concurrent requests can be supported in an acceptable time duration with demo api server?
usage
How to use vllm
#4853
opened May 16, 2024 by
senbinyu
[Bug]: Qwen1.5-72B L20x8 latest vLLM TPOT slower than v0.4.0.post, 48ms vs 39ms, why?
bug
Something isn't working
#4852
opened May 16, 2024 by
DefTruth
[Misc]: Assertion with no scription in vllm with DeepSeekMath 7b model, why, how to fix?
misc
#4849
opened May 16, 2024 by
brando90
[Feature]: Build and publish Neuron docker image
feature request
#4838
opened May 15, 2024 by
yaronr
[Bug]: Running vllm docker image with neuron fails
bug
Something isn't working
#4836
opened May 15, 2024 by
yaronr
[New Model]: Google's Paligemma family of models
new model
Requests to new models
#4833
opened May 15, 2024 by
nfplay
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.