预训练问题 #780

LLLiHaotian · 2024-05-14T03:02:57Z

如果想基于RetroMAE预训练bart、t5系列的模型，应该如何解决呢？

bart-base-chinese-cluecorpussmall-retromae_batch256_max350.log

staoxiao · 2024-05-14T04:16:20Z

Currently, this script doesn't support encoder-decoder architecture.

LLLiHaotian · 2024-05-14T05:43:42Z

好的谢谢
还有想请问，在预训练过程中的report是这样的，在您的预训练实验中是如何判断何时停止的呢？仅凭loss曲线的变化吗？
{'loss': 2.8222, 'learning_rate': 1.1313075087080098e-05, 'step': 103000, 'epoch': 1.3}
另外我注意到，在训练过程中也会偶尔出现loss值变高（但不明显，很小的变化）的情况，请问你们在预训练的过程中是否遇到过这种情况，又是如何判断何时停止的呢？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

预训练问题 #780

预训练问题 #780

LLLiHaotian commented May 14, 2024

staoxiao commented May 14, 2024

LLLiHaotian commented May 14, 2024

预训练问题 #780

预训练问题 #780

Comments

LLLiHaotian commented May 14, 2024

staoxiao commented May 14, 2024

LLLiHaotian commented May 14, 2024