We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mac, native conda, mlx installed.
Name: airllm Version: 2.8.3 Summary: AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning. Home-page: https://github.com/lyogavin/Anima/tree/main/air_llm Author: Gavin Li Author-email: gavinli@animaai.cloud License: Location: /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/site-packages Requires: accelerate, huggingface-hub, optimum, safetensors, scipy, torch, tqdm, transformers Required-by: (native) taozhiyu@603e5f4a42f1 downloads % python3 airllm2.py found index file... found_layers:{'model.embed_tokens.': False, 'model.layers.0.': False, 'model.layers.1.': False, 'model.layers.2.': False, 'model.layers.3.': False, 'model.layers.4.': False, 'model.layers.5.': False, 'model.layers.6.': False, 'model.layers.7.': False, 'model.layers.8.': False, 'model.layers.9.': False, 'model.layers.10.': False, 'model.layers.11.': False, 'model.layers.12.': False, 'model.layers.13.': False, 'model.layers.14.': False, 'model.layers.15.': False, 'model.layers.16.': False, 'model.layers.17.': False, 'model.layers.18.': False, 'model.layers.19.': False, 'model.layers.20.': False, 'model.layers.21.': False, 'model.layers.22.': False, 'model.layers.23.': False, 'model.layers.24.': False, 'model.layers.25.': False, 'model.layers.26.': False, 'model.layers.27.': False, 'model.layers.28.': False, 'model.layers.29.': False, 'model.layers.30.': False, 'model.layers.31.': False, 'model.layers.32.': False, 'model.layers.33.': False, 'model.layers.34.': False, 'model.layers.35.': False, 'model.layers.36.': False, 'model.layers.37.': False, 'model.layers.38.': False, 'model.layers.39.': False, 'model.layers.40.': False, 'model.layers.41.': False, 'model.layers.42.': False, 'model.layers.43.': False, 'model.layers.44.': False, 'model.layers.45.': False, 'model.layers.46.': False, 'model.layers.47.': False, 'model.layers.48.': False, 'model.layers.49.': False, 'model.layers.50.': False, 'model.layers.51.': False, 'model.layers.52.': False, 'model.layers.53.': False, 'model.layers.54.': False, 'model.layers.55.': False, 'model.layers.56.': False, 'model.layers.57.': False, 'model.layers.58.': False, 'model.layers.59.': False, 'model.layers.60.': False, 'model.layers.61.': False, 'model.layers.62.': False, 'model.layers.63.': False, 'model.layers.64.': False, 'model.layers.65.': False, 'model.layers.66.': False, 'model.layers.67.': False, 'model.layers.68.': False, 'model.layers.69.': False, 'model.layers.70.': False, 'model.layers.71.': False, 'model.layers.72.': False, 'model.layers.73.': False, 'model.layers.74.': False, 'model.layers.75.': False, 'model.layers.76.': False, 'model.layers.77.': False, 'model.layers.78.': False, 'model.layers.79.': False, 'model.norm.': False, 'lm_head.': False} some layer splits found, some are not, re-save all layers in case there's some corruptions. 0%| | 0/83 [00:00<?, ?it/s]Loading shard 1/30 zsh: segmentation fault python3 airllm2.py (native) taozhiyu@603e5f4a42f1 downloads % /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' (native) taozhiyu@603e5f4a42f1 downloads %
`
`from airllm import AutoModel
MAX_LENGTH = 128
model = AutoModel.from_pretrained("/Users/taozhiyu/Downloads/Meta-Llama-3-70B-Instruct")
input_text = [ 'What is the capital of United States?', ]
input_tokens = model.tokenizer(input_text, return_tensors="pt", return_attention_mask=False, truncation=True, max_length=MAX_LENGTH, padding=False)
generation_output = model.generate( input_tokens['input_ids'].cuda(), max_new_tokens=20, use_cache=True, return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output) `
The text was updated successfully, but these errors were encountered:
What macbook are you using? An M3 Max? 🤔 I've seen an issue like this here as well as in the coreml stable diffusion repo specific to M3 macbooks.
Sorry, something went wrong.
m3 max 128gb
My suspicion yeah, very odd. I have a 36GB M3 Max.
No branches or pull requests
Mac, native conda, mlx installed.
`(native) taozhiyu@603e5f4a42f1 downloads % pip show mlx airllm
Name: mlx
Version: 0.11.1
Summary: A framework for machine learning on Apple silicon.
Home-page: https://github.com/ml-explore/mlx
Author: MLX Contributors
Author-email: mlx@group.apple.com
License:
Location: /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/site-packages
Requires:
Required-by:
Name: airllm
Version: 2.8.3
Summary: AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning.
Home-page: https://github.com/lyogavin/Anima/tree/main/air_llm
Author: Gavin Li
Author-email: gavinli@animaai.cloud
License:
Location: /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/site-packages
Requires: accelerate, huggingface-hub, optimum, safetensors, scipy, torch, tqdm, transformers
Required-by:
(native) taozhiyu@603e5f4a42f1 downloads % python3 airllm2.py
found index file...
found_layers:{'model.embed_tokens.': False, 'model.layers.0.': False, 'model.layers.1.': False, 'model.layers.2.': False, 'model.layers.3.': False, 'model.layers.4.': False, 'model.layers.5.': False, 'model.layers.6.': False, 'model.layers.7.': False, 'model.layers.8.': False, 'model.layers.9.': False, 'model.layers.10.': False, 'model.layers.11.': False, 'model.layers.12.': False, 'model.layers.13.': False, 'model.layers.14.': False, 'model.layers.15.': False, 'model.layers.16.': False, 'model.layers.17.': False, 'model.layers.18.': False, 'model.layers.19.': False, 'model.layers.20.': False, 'model.layers.21.': False, 'model.layers.22.': False, 'model.layers.23.': False, 'model.layers.24.': False, 'model.layers.25.': False, 'model.layers.26.': False, 'model.layers.27.': False, 'model.layers.28.': False, 'model.layers.29.': False, 'model.layers.30.': False, 'model.layers.31.': False, 'model.layers.32.': False, 'model.layers.33.': False, 'model.layers.34.': False, 'model.layers.35.': False, 'model.layers.36.': False, 'model.layers.37.': False, 'model.layers.38.': False, 'model.layers.39.': False, 'model.layers.40.': False, 'model.layers.41.': False, 'model.layers.42.': False, 'model.layers.43.': False, 'model.layers.44.': False, 'model.layers.45.': False, 'model.layers.46.': False, 'model.layers.47.': False, 'model.layers.48.': False, 'model.layers.49.': False, 'model.layers.50.': False, 'model.layers.51.': False, 'model.layers.52.': False, 'model.layers.53.': False, 'model.layers.54.': False, 'model.layers.55.': False, 'model.layers.56.': False, 'model.layers.57.': False, 'model.layers.58.': False, 'model.layers.59.': False, 'model.layers.60.': False, 'model.layers.61.': False, 'model.layers.62.': False, 'model.layers.63.': False, 'model.layers.64.': False, 'model.layers.65.': False, 'model.layers.66.': False, 'model.layers.67.': False, 'model.layers.68.': False, 'model.layers.69.': False, 'model.layers.70.': False, 'model.layers.71.': False, 'model.layers.72.': False, 'model.layers.73.': False, 'model.layers.74.': False, 'model.layers.75.': False, 'model.layers.76.': False, 'model.layers.77.': False, 'model.layers.78.': False, 'model.layers.79.': False, 'model.norm.': False, 'lm_head.': False}
some layer splits found, some are not, re-save all layers in case there's some corruptions.
0%| | 0/83 [00:00<?, ?it/s]Loading shard 1/30
zsh: segmentation fault python3 airllm2.py
(native) taozhiyu@603e5f4a42f1 downloads % /Users/taozhiyu/miniconda3/envs/native/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(native) taozhiyu@603e5f4a42f1 downloads %
`
`from airllm import AutoModel
MAX_LENGTH = 128
could use hugging face model repo id:
model = AutoModel.from_pretrained("garage-bAInd/Platypus2-70B-instruct")
or use model's local path...
model = AutoModel.from_pretrained("/Users/taozhiyu/Downloads/Meta-Llama-3-70B-Instruct")
input_text = [
'What is the capital of United States?',
]
input_tokens = model.tokenizer(input_text,
return_tensors="pt",
return_attention_mask=False,
truncation=True,
max_length=MAX_LENGTH,
padding=False)
generation_output = model.generate(
input_tokens['input_ids'].cuda(),
max_new_tokens=20,
use_cache=True,
return_dict_in_generate=True)
output = model.tokenizer.decode(generation_output.sequences[0])
print(output)
`
The text was updated successfully, but these errors were encountered: