Why does vllm flash attention build only support CUDA 12.1 (not 11.8) ? #4801

thangld201 · 2024-05-14T03:27:59Z

thangld201
May 14, 2024

Hi @youkaichao @WoosukKwon, I see that the original flash attention repo supports CUDA 11.6+. However, I am not sure why vllm fork only supports build for CUDA 12.1; and why vllm (master) does not use the original flash attention build.

How can I enable vllm to build flash attention with CUDA 11.8 ? (master version now only uses flash attention from vllm fork and that does not support CUDA 11.8)

EDIT: I cloned the master version, updated after the #4686

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does vllm flash attention build only support CUDA 12.1 (not 11.8) ? #4801

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Why does vllm flash attention build only support CUDA 12.1 (not 11.8) ? #4801

thangld201 May 14, 2024

Replies: 0 comments

thangld201
May 14, 2024