You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromairllmimportAutoModelMAX_LENGTH=128model=AutoModel.from_pretrained("/mnt/d/miqu-1-70b-sf", compression='4bit')
input_text= [
"[INST] eloquent high camp prose about a cute catgirl [/INST]",
]
model.tokenizer.pad_token=model.tokenizer.eos_tokeninput_tokens=model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=True)
generation_output=model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=False,
return_dict_in_generate=True)
output=model.tokenizer.decode(generation_output.sequences[0])
print(output)
Problem
Keep running layers(self.running_device):
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(self.running_device): 100%|██████████████████████████████████████████████| 83/83 [27:57<00:00, 20.22s/it]
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(self.running_device): 100%|██████████████████████████████████████████████| 83/83 [30:15<00:00, 21.87s/it]
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(self.running_device): 100%|████████████████████████████████████████████| 83/83 [1:04:38<00:00, 46.73s/it]
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(self.running_device): 100%|████████████████████████████████████████████| 83/83 [1:13:57<00:00, 53.47s/it]
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
running layers(self.running_device): 23%|██████████▌ | 19/83 [11:01<37:06, 34.79s/it]
Loading model didn't give errors, but says this
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
not support prefetching for compression for now. loading with no prepetching mode.
Env
Model used
https://huggingface.co/152334H/miqu-1-70b-sf
Code
Problem
Keep
running layers(self.running_device):
Loading model didn't give errors, but says this
Solution from #107 didn't work
The text was updated successfully, but these errors were encountered: