Baichuan2-7B-Base微调报错 AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight' #395

qingchen177 · 2024-04-03T03:35:56Z

命令：
deepspeed --hostfile=$hostfile fine-tune.py
--report_to "none"
--data_path "data/test.json"
--model_name_or_path "/data/models/Baichuan2-7B-Base"
--output_dir "output"
--model_max_length 512
--num_train_epochs 4
--per_device_train_batch_size 16
--gradient_accumulation_steps 1
--save_strategy epoch
--learning_rate 2e-5
--lr_scheduler_type constant
--adam_beta1 0.9
--adam_beta2 0.98
--adam_epsilon 1e-8
--max_grad_norm 1.0
--weight_decay 1e-4
--warmup_ratio 0.0
--logging_steps 1
--gradient_checkpointing True
--deepspeed ds_config.json
--bf16 True
--tf32 True
--use_lora True

报错：AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'

改用Baichuan2-13B-Base直接报内存溢出：
torch.cudatorch.cuda..OutOfMemoryErrorOutOfMemoryError: : CUDA out of memory. Tried to allocate 1.92 GiB (GPU 0; 23.65 GiB total capacity; 15.90 GiB already allocated; 1.04 GiB free; 21.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONFCUDA out of memory. Tried to allocate 1.92 GiB (GPU 1; 23.65 GiB total capacity; 15.90 GiB already allocated; 1.73 GiB free; 21.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

output = input.matmul(weight.t())

三张4090跑的，求解

The text was updated successfully, but these errors were encountered:

pandong2011 · 2024-04-28T03:05:56Z

可以参考一下 https://www.modelscope.cn/models/baichuan-inc/Baichuan2-7B-Base/feedback/prDetail/9750

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Baichuan2-7B-Base微调报错 AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight' #395

Baichuan2-7B-Base微调报错 AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight' #395

qingchen177 commented Apr 3, 2024

pandong2011 commented Apr 28, 2024

Baichuan2-7B-Base微调报错 AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight' #395

Baichuan2-7B-Base微调报错 AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight'AttributeError: 'BaichuanConfig' object has no attribute 'z_loss_weight' #395

Comments

qingchen177 commented Apr 3, 2024

pandong2011 commented Apr 28, 2024