Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ibm granite #758

Merged
merged 24 commits into from
May 22, 2024
Merged

Add support for ibm granite #758

merged 24 commits into from
May 22, 2024

Conversation

Blaizzy
Copy link
Contributor

@Blaizzy Blaizzy commented May 6, 2024

This PR adds support for openELM.
Twitter: @Prince_Canuma

Todo:

  • 3-8B
  • 20-34B

@Blaizzy Blaizzy changed the title Add support for granite Add support for ibm granite May 6, 2024
@Blaizzy
Copy link
Contributor Author

Blaizzy commented May 14, 2024

The KVcache change broke my original implementation but I found a work arround it :)

Ready for reaview @awni

@Blaizzy Blaizzy marked this pull request as ready for review May 14, 2024 16:39
@awni
Copy link
Member

awni commented May 19, 2024

@Blaizzy this diff includes some previous PRs. Can you rebase on origin's main and force push to your branch?

@Blaizzy
Copy link
Contributor Author

Blaizzy commented May 20, 2024

Hey @awni,

It's done ✅

Comment on lines +20 to +21
attention_bias: bool = False
mlp_bias: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason you added these / changed this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granite 3B and 8B require them, and its different from the base llama models.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I did not realize those models had model_type = llama. Makes sense.

Comment on lines 130 to 134
def create_additive_causal_mask(N: int, offset: int = 0):
rinds = mx.arange(offset + N)
linds = mx.arange(offset, offset + N) if offset else rinds
mask = linds[:, None] < rinds[None]
return mask * -1e9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind refactoring this into base.py and calling that from this model and the llama.py model? Ultimately we'd like all the models to use this same version but we can start with that and then update from there rather than duplicate the function everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's done ✅

Comment on lines 22 to 24
attn_pdrop: float = 0.1
embd_pdrop: float = 0.1
resid_pdrop: float = 0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these unused arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are being used on the attention, embedding and mlp blocks :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right let's remove those Dropout layers as well.. we don't add dropout layers to model files here since we don't use them for trianing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

Done ✅

@awni
Copy link
Member

awni commented May 20, 2024

Very nicely done! I just left a couple minor comments. Please check them and then we can merge it!

@Blaizzy
Copy link
Contributor Author

Blaizzy commented May 20, 2024

Thank you very much, @awni!

I addressed all comments,

Let me know if there anything else

@awni awni merged commit b044ce2 into ml-explore:main May 22, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants