Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

按之前正常的训练配置甚至降低了梯度累加步数,连续两个训练MemoryError #406

Open
testworksizan opened this issue Apr 7, 2024 · 0 comments

Comments

@testworksizan
Copy link

训练中途(该阶段电脑完全闲置)报错
Average key norm=0.776, Keys Scaled=211, avr_loss=0.0816]
saving checkpoint: ./output\0405end-000003.safetensors
MemoryError
thread '' panicked at C:\Users\runneradmin.cargo\registry\src\index.crates.io-6f17d22bba15001f\pyo3-0.20.2\src\err\mod.rs:788:5:
Python API call failed
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Traceback (most recent call last):
File "C:\Users\songz\Documents\app\lora-scripts-v1.8.5\sd-scripts\sdxl_train_network.py", line 184, in
trainer.train(args)
File "C:\Users\songz\Documents\app\lora-scripts-v1.8.5\sd-scripts\train_network.py", line 916, in train
save_model(ckpt_name, accelerator.unwrap_model(network), global_step, epoch + 1)
File "C:\Users\songz\Documents\app\lora-scripts-v1.8.5\sd-scripts\train_network.py", line 732, in save_model
unwrapped_nw.save_weights(ckpt_file, save_dtype, metadata_to_save)
File "C:\Users\songz\Documents\app\lora-scripts-v1.8.5\sd-scripts\networks\lora.py", line 1116, in save_weights
model_hash, legacy_hash = train_util.precalculate_safetensors_hashes(state_dict, metadata)
File "C:\Users\songz\Documents\app\lora-scripts-v1.8.5\sd-scripts\library\train_util.py", line 2430, in precalculate_safetensors_hashes
bytes = safetensors.torch.save(tensors, metadata)
File "C:\Users\songz\Documents\app\lora-scripts-v1.8.5\python\lib\site-packages\safetensors\torch.py", line 245, in save
serialized = serialize(_flatten(tensors), metadata=metadata)
pyo3_runtime.PanicException: Python API call failed

model_train_type = "sdxl-lora"
pretrained_model_name_or_path = "C:/Users/songz/Documents/app/lora-scripts-v1.8.5/sd-models/animagineXLV31_v31.safetensors"
vae = "C:/Users/songz/Documents/app/lora-scripts-v1.8.5/sd-models/sdxl_vae.safetensors"
v2 = false
train_data_dir = "C:/Users/songz/Documents/app/lora-scripts-v1.8.5/train/ai0406/train"
reg_data_dir = "C:/Users/songz/Documents/app/lora-scripts-v1.8.5/train/ai0406/reg"
prior_loss_weight = 1
resolution = "1024,1024"
enable_bucket = true
min_bucket_reso = 128
max_bucket_reso = 2048
bucket_reso_steps = 32
output_name = "0405end"
output_dir = "./output"
save_model_as = "safetensors"
save_precision = "bf16"
save_every_n_epochs = 1
max_train_epochs = 12
train_batch_size = 2
gradient_checkpointing = true
gradient_accumulation_steps = 3
network_train_unet_only = false
network_train_text_encoder_only = false
learning_rate = 0.0001
unet_lr = 0.0000377
text_encoder_lr = 0.00000753
lr_scheduler = "cosine"
lr_warmup_steps = 0
optimizer_type = "Lion8bit"
min_snr_gamma = 5
network_module = "networks.lora"
network_dim = 128
network_alpha = 64
scale_weight_norms = 1
sample_prompts = "1girl, solo,, --n lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts,signature, watermark, username, blurry, --w 1024 --h 1024 --l 7 --s 28 --d 1337"
sample_sampler = "euler_a"
sample_every_n_epochs = 1
log_with = "tensorboard"
logging_dir = "./logs"
caption_extension = ".txt"
shuffle_caption = true
keep_tokens = 0
max_token_length = 255
seed = 1337
mixed_precision = "bf16"
full_bf16 = true
no_half_vae = true
xformers = true
lowram = false
cache_latents = true
cache_latents_to_disk = true
persistent_data_loader_workers = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant