You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying finetuning llama2 7B with lora by running quickstart.ipynb(https://github.com/facebookresearch/llama-recipes/blob/main/examples/quickstart.ipynb), using an A100 40G GPU.
When I load the model in int8 and create a PeftModel in int8 (just as the original setting in quickstart.ipynb), the training occupies 14 GB GPU memory (batch_size is set to be 2).
However, when I load the model in fp16 and create a PeftModel in fp16, the training occupies 40 GB GPU memory (batch_size is set to be 1 ).
The part of the code I modified is shown below:
model =LlamaForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16)
...
config = {
'lora_config': lora_config,
'learning_rate': 1e-5,# from 1e-4 to 1e-5
'num_train_epochs': 1,
'gradient_accumulation_steps': 4,#from 2 to 4
'per_device_train_batch_size': 1,#from 2 to 1
'gradient_checkpointing': False,
}
...
# model = prepare_model_for_int8_training(model)
model = get_peft_model(model, peft_config)
...
Can someone explain why there is such a huge difference of GPU memory consumption?
The text was updated successfully, but these errors were encountered:
@lankuohsing reading the bit&bytes docs might be helpful, this support a bunch of functionality beside int8 matrix multiplication, it also supports int8 optimizer that can significanly reduce the memory requirements and if you look at prepare_model_for_int8_train it also enables gradient checkpointing which is big memory saver. That should sort of explain the difference.
I am trying finetuning llama2 7B with lora by running quickstart.ipynb(https://github.com/facebookresearch/llama-recipes/blob/main/examples/quickstart.ipynb), using an A100 40G GPU.
When I load the model in int8 and create a PeftModel in int8 (just as the original setting in quickstart.ipynb), the training occupies 14 GB GPU memory (batch_size is set to be 2).
However, when I load the model in fp16 and create a PeftModel in fp16, the training occupies 40 GB GPU memory (batch_size is set to be 1 ).
The part of the code I modified is shown below:
Can someone explain why there is such a huge difference of GPU memory consumption?
The text was updated successfully, but these errors were encountered: