Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running quickstart.ipynb, loading model in int8 and fp16 occupy significantly different amounts of GPU memory. #374

Open
lankuohsing opened this issue Feb 17, 2024 · 1 comment
Labels

Comments

@lankuohsing
Copy link

lankuohsing commented Feb 17, 2024

I am trying finetuning llama2 7B with lora by running quickstart.ipynb(https://github.com/facebookresearch/llama-recipes/blob/main/examples/quickstart.ipynb), using an A100 40G GPU.
When I load the model in int8 and create a PeftModel in int8 (just as the original setting in quickstart.ipynb), the training occupies 14 GB GPU memory (batch_size is set to be 2).
However, when I load the model in fp16 and create a PeftModel in fp16, the training occupies 40 GB GPU memory (batch_size is set to be 1 ).

The part of the code I modified is shown below:

model =LlamaForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
...
config = {
    'lora_config': lora_config,
    'learning_rate': 1e-5,# from 1e-4 to 1e-5
    'num_train_epochs': 1,
    'gradient_accumulation_steps': 4,#from 2 to 4
    'per_device_train_batch_size': 1,#from 2 to 1
    'gradient_checkpointing': False,
}
...
# model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config)
...

Can someone explain why there is such a huge difference of GPU memory consumption?

@HamidShojanazeri
Copy link
Contributor

@lankuohsing reading the bit&bytes docs might be helpful, this support a bunch of functionality beside int8 matrix multiplication, it also supports int8 optimizer that can significanly reduce the memory requirements and if you look at prepare_model_for_int8_train it also enables gradient checkpointing which is big memory saver. That should sort of explain the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants