Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate dividing in relative positional encoding #11

Open
songweige opened this issue Jun 15, 2022 · 1 comment
Open

Duplicate dividing in relative positional encoding #11

songweige opened this issue Jun 15, 2022 · 1 comment

Comments

@songweige
Copy link

Hey @lucidrains, thanks for keeping these models implemented. In line 88

num_buckets //= 2
ret += (n < 0).long() * num_buckets
n = torch.abs(n)
max_exact = num_buckets // 2
you have max_exact as the half of num_buckets, whose value was already halved in line 84.

I think that is duplicated and should be changed to identity:

 max_exact = num_buckets
@oxjohanndiep
Copy link

I suggest you read the paper "On Scalar Embedding of Relative Positions in Attention Models". In that paper, they explain the implemented bucketing function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants