Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to add support for Infini-attention? #292

Open
sdmorrey opened this issue May 11, 2024 · 2 comments
Open

Is it possible to add support for Infini-attention? #292

sdmorrey opened this issue May 11, 2024 · 2 comments

Comments

@sdmorrey
Copy link

There's some work being done to implement Infini-attention from https://arxiv.org/pdf/2404.07143

In a nutshell it allows for essentially an unlimited context length without incurring the quadratic penalty. There's a proof of concept with 10M token context running in less than 32GB of RAM here...
https://github.com/mustafaaljadery/gemma-2B-10M

I believe we will see more models adopting this approach and if this were officially supported it would be a huge benefit to the community.

I don't have the rust chops to pull this off, but I thought I'd at least bring it to your attention since you have Phi-3 with 128k context working already.

Thanks for all your hard work!

@EricLBuehler
Copy link
Owner

Thank you for letting me know! I think this would be a valuable addition, and I'll try to implement it. I took a look at the implementation you linked, and I think this is the key change, is that correct?

https://github.com/mustafaaljadery/gemma-2B-10M/blob/main/src/gemma.py#L488-L548

I'm looking forward to implementing this.

@nidhoggr-nil
Copy link

Looks interesting, wonder if there are any downsides, probably depends on how the model compresses the knowledge so that information loss is minimal and how the practical side of the memory implementation is handled for larger contexts.

Looks like it is the correct place, the functions for retrieval and storage etc, also need to be implemented, so basically section 2.1.1 and 2.1.2 in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants