attn impl to sdpa... #107

saa1028 · 2024-01-24T03:25:29Z

new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>

profintegra · 2024-02-01T21:42:26Z

I have the same issue

profintegra · 2024-02-01T21:56:45Z

Solution:
in your python code,
insert line:
model.tokenizer.pad_token = model.tokenizer.eos_token
before this line:
input_tokens = model.tokenizer(input_text, ......

ahmedbr · 2024-02-21T08:08:49Z

I have same problem. Any updates on this?

leedahae340 · 2024-05-07T08:10:38Z

这个不是问题，和这里有关系max_new_tokens=20，如果是20，就要跑20次，如果是200，就要跑200次。。。
有点慢

ivanbaldo mentioned this issue Feb 2, 2024

ValueError: LlamaForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. #101

Open

Kira-Pgr mentioned this issue Feb 6, 2024

Generation takes forever #111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attn impl to sdpa... #107

attn impl to sdpa... #107

saa1028 commented Jan 24, 2024

profintegra commented Feb 1, 2024

profintegra commented Feb 1, 2024

ahmedbr commented Feb 21, 2024

leedahae340 commented May 7, 2024

attn impl to sdpa... #107

attn impl to sdpa... #107

Comments

saa1028 commented Jan 24, 2024

profintegra commented Feb 1, 2024

profintegra commented Feb 1, 2024

ahmedbr commented Feb 21, 2024

leedahae340 commented May 7, 2024