Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Fix custom all reduce nvlink check on multi node
#4903
opened May 19, 2024 by
esmeetu
Loading…
[Core] Eliminate parallel worker per-step task scheduling overhead
#4894
opened May 18, 2024 by
njhill
Loading…
[Misc] Load FP8 kv-cache scaling factors from checkpoints
#4893
opened May 17, 2024 by
comaniac
Loading…
1 task
[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true
#4856
opened May 16, 2024 by
liuzhenghua
Loading…
[Bugfix / Core] Prefix Caching Guards (merged with main)
#4846
opened May 16, 2024 by
zhuohan123
Loading…
Add a new kernel for fusing the dequantization in fused-moe gemm
#4841
opened May 15, 2024 by
RezaYazdaniAminabadi
Loading…
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)
#4837
opened May 15, 2024 by
afeldman-nm
Loading…
[Build/CI] Enabling AMD Entrypoints Test
rocm
#4834
opened May 15, 2024 by
Alexei-V-Ivanov-AMD
Loading…
[Hardware][Intel] Add LoRA adapter support for CPU backend
x86 CPU
#4830
opened May 15, 2024 by
Isotr0py
Loading…
[Speculative decoding] Enable TP>1 speculative decoding
#4808
opened May 14, 2024 by
cadedaniel
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.