Skip to content

Layer-Wise Learning Rate #984

Answered by githubnemo
MohammadJavadD asked this question in Q&A
Discussion options

You must be logged in to vote

To implement layer-wise learning rates you can facilitate the param group feature of pytorch optimizers. In essence, you can define parameter group with optimizer-specific attributes (like the learning rate). In PyTorch you would have to provide the actual parameters. Since skorch uses lazy evaluation the parameters are not known before initialization, therefore we provide you with our own version of param groups that matches with the parameter names instead of the actual parameter objects.

A basic example from the docs:

net = NeuralNet(
    my_net,
    optimizer__param_groups=[
        ('embedding.*', {'lr': 0.0}),
        ('linear0.bias', {'lr': 1}),
    ],
)

Note that embedding and lin…

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@githubnemo
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by MohammadJavadD
Comment options

You must be logged in to vote
1 reply
@ottonemo
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants