You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when freezing the top n layers, it has different effect for different models:
for qwen, the freezed layers has no gradients, so the gpu ram becomes lower contrast to none freezing.
but for qwen1.5 and wizard-8x22B, the freezed layers seemes to have gradients, so the gpu ram still high, even when I freeze top n-1 layers and fintune the last one layer, it gives me OOM when trained on 3 nodes(total 24* A800).
Error logs
cuda out of memoy
Expected behavior
can freeze top layers for any models
The text was updated successfully, but these errors were encountered:
@hjc3613 sorry for the inconvenience, this feature is not well tested thats why we didn't mention it much. If you are interested, would love to work with you and make a PR to fix the issues.
@hjc3613 sorry for the inconvenience, this feature is not well tested thats why we didn't mention it much. If you are interested, would love to work with you and make a PR to fix the issues.
I think this feature is necessary, it can save much gpu memory and get the same results as full parameter training in some practical. I have test in my task that when freeze top 40 layers(total 80 layers), the test results between none freeze are same锛侊紒but only use 1 nodes(8*80G)
System Info
Dockerfile:
Information
馃悰 Describe the bug
when freezing the top n layers, it has different effect for different models:
for qwen, the freezed layers has no gradients, so the gpu ram becomes lower contrast to none freezing.
but for qwen1.5 and wizard-8x22B, the freezed layers seemes to have gradients, so the gpu ram still high, even when I freeze top n-1 layers and fintune the last one layer, it gives me OOM when trained on 3 nodes(total 24* A800).
Error logs
cuda out of memoy
Expected behavior
can freeze top layers for any models
The text was updated successfully, but these errors were encountered: