Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trainer.fit from checkpoint without performance improvement will break 'last' link to checkpoint on window11 #19845

Open
workhours opened this issue May 4, 2024 · 0 comments
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers

Comments

@workhours
Copy link

Bug description

just as titled, training a model on window11, pass a checkpoint callback to trainer and keep ckpt_path as None as code below, then fit model with data and lightning will create link well to checkpoint file.
then trains the same model again but load model from ckpt_path, this time make it no improvement while fitting model. after training done then 'last' link become wrong.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

checkpoint_callback = ModelCheckpoint(
        monitor='val_loss',  # 监控的指标
        dirpath='training/checkpoints/',  # 保存检查点的目录
        filename=experiment_name+'-{epoch}-{val_loss:.3f}',  # 检查点文件名的格式
        save_top_k=1,  # 仅保存最佳的一个模型
        mode='min',  # 因为是损失,所以越小越好
        save_last='link',
        save_on_train_epoch_end=True,
        every_n_epochs=5
    )
...
    trainer.fit(model, ckpt_path=None if initial else 'last')
    trainer.test(model)

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

@workhours workhours added bug Something isn't working needs triage Waiting to be triaged by maintainers labels May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

1 participant