vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.7k

Code
Issues 825
Pull requests 231
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

231 Open 1,693 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Kernel] Add flash-attn back

#4907 opened May 19, 2024 by WoosukKwon • Draft

[CI/Build] Make marlin kernel build conditional.

#4905 opened May 19, 2024 by esmeetu

Loading…

[Bugfix] Fix custom all reduce nvlink check on multi node

#4903 opened May 19, 2024 by esmeetu

Loading…

[Kernel] Add marlin_24 unit tests

#4901 opened May 18, 2024 by alexm-neuralmagic

Loading…

Update test_ignore_eos

#4898 opened May 18, 2024 by simon-mo

Loading…

[Core] Fix scheduler considering "no LoRA" as "LoRA"

#4897 opened May 18, 2024 by Yard1

Loading…

[Core] Eliminate parallel worker per-step task scheduling overhead

#4894 opened May 18, 2024 by njhill

Loading…

[Misc] Load FP8 kv-cache scaling factors from checkpoints

#4893 opened May 17, 2024 by comaniac

Loading…

1 task

[Core] Sharded State Loader download from HF

#4889 opened May 17, 2024 by aurickq

Loading…

[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support)

#4888 opened May 17, 2024 by afeldman-nm • Draft

[Model] Add Phi-2 LoRA support

#4886 opened May 17, 2024 by Isotr0py

Loading…

[Bugfix] Fix with verifying model max len

#4885 opened May 17, 2024 by dimaioksha

Loading…

[Build/CI] Extending AMD Tests

#4875 opened May 17, 2024 by Alexei-V-Ivanov-AMD

Loading…

[Draft][CI/Build] Optimize models tests

#4874 opened May 17, 2024 by DarkLight1337 • Draft

[CI/Build] Add health check

#4868 opened May 16, 2024 by pseudotensor

Loading…

Add control panel allow manage multi vllm instances

#4861 opened May 16, 2024 by leiwen83

Loading…

[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true

#4856 opened May 16, 2024 by liuzhenghua

Loading…

[Bugfix / Core] Prefix Caching Guards (merged with main)

#4846 opened May 16, 2024 by zhuohan123

Loading…

[Core] Avoid one broadcast op when propagating metadata

#4844 opened May 16, 2024 by njhill • Draft

Add a new kernel for fusing the dequantization in fused-moe gemm

#4841 opened May 15, 2024 by RezaYazdaniAminabadi

Loading…

[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)

#4837 opened May 15, 2024 by afeldman-nm

Loading…

[Build/CI] Enabling AMD Entrypoints Test

rocm

#4834 opened May 15, 2024 by Alexei-V-Ivanov-AMD

Loading…

[Hardware][Intel] Add LoRA adapter support for CPU backend

x86 CPU

#4830 opened May 15, 2024 by Isotr0py

Loading…

[Speculative decoding] Enable TP>1 speculative decoding

#4808 opened May 14, 2024 by cadedaniel

Loading…

[Doc] Add page for PoolingParams

#4800 opened May 14, 2024 by DarkLight1337

Loading…

Previous 1 2 3 4 5 … 9 10 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly