You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a nutshell it allows for essentially an unlimited context length without incurring the quadratic penalty. There's a proof of concept with 10M token context running in less than 32GB of RAM here... https://github.com/mustafaaljadery/gemma-2B-10M
I believe we will see more models adopting this approach and if this were officially supported it would be a huge benefit to the community.
I don't have the rust chops to pull this off, but I thought I'd at least bring it to your attention since you have Phi-3 with 128k context working already.
Thanks for all your hard work!
The text was updated successfully, but these errors were encountered:
Thank you for letting me know! I think this would be a valuable addition, and I'll try to implement it. I took a look at the implementation you linked, and I think this is the key change, is that correct?
Looks interesting, wonder if there are any downsides, probably depends on how the model compresses the knowledge so that information loss is minimal and how the practical side of the memory implementation is handled for larger contexts.
Looks like it is the correct place, the functions for retrieval and storage etc, also need to be implemented, so basically section 2.1.1 and 2.1.2 in the paper.
There's some work being done to implement Infini-attention from https://arxiv.org/pdf/2404.07143
In a nutshell it allows for essentially an unlimited context length without incurring the quadratic penalty. There's a proof of concept with 10M token context running in less than 32GB of RAM here...
https://github.com/mustafaaljadery/gemma-2B-10M
I believe we will see more models adopting this approach and if this were officially supported it would be a huge benefit to the community.
I don't have the rust chops to pull this off, but I thought I'd at least bring it to your attention since you have Phi-3 with 128k context working already.
Thanks for all your hard work!
The text was updated successfully, but these errors were encountered: