Why does vllm flash attention build only support CUDA 12.1 (not 11.8) ? #4801
thangld201
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi @youkaichao @WoosukKwon, I see that the original flash attention repo supports CUDA 11.6+. However, I am not sure why vllm fork only supports build for CUDA 12.1; and why vllm (master) does not use the original flash attention build.
How can I enable vllm to build flash attention with CUDA 11.8 ? (master version now only uses flash attention from vllm fork and that does not support CUDA 11.8)
EDIT: I cloned the master version, updated after the #4686
Beta Was this translation helpful? Give feedback.
All reactions