CatLIP checkpoints on the hub 🤗 #1

Vaibhavs10 · 2024-04-24T06:36:54Z

Hey hey! - I'm VB, I work on the open source team at Hugging Face. Massive congratulations on the OpenELM release, it's quite refreshing to see such a brilliant open release from Apple.

I was going through the trained checkpoints, and wasn't able to find CatLIP checkpoints. It'd be nice if you can upload it to Hugging Face similar to OpenELM checkpoints.

Let me know if you need a hand with that.

Cheers!
VB

sacmehta · 2024-04-24T17:14:55Z

Thank you for your interest in CatLIP. Our checkpoints are ready for use, and we would greatly appreciate your assistance in converting them to HuggingFace format.

pcuenca · 2024-04-25T18:10:10Z

Hi @sacmehta! We can certainly help with that :)

In parallel, please note that you can still upload the weights in native format to the Hub, there's no requirement for checkpoints to follow any particular format or be compatible with any given library! If you upload them, people can easily download and use them with your own inference code (or with MLX, if they're compatible). This would allow the community to test them immediately :)

Vaibhavs10 · 2024-04-30T07:58:35Z

Hey hey @sacmehta - I hope you are doing well. We've now uploaded all the models on the Hugging Face Hub under the corenet-community org: https://huggingface.co/corenet-community

Should we move this under the Apple org?

sacmehta · 2024-05-01T03:22:56Z

Thanks @Vaibhavs10 and @pcuenca . Really appreciate your help in creating corenet-comunity page + converting OpenELM models to CoreML.

In my opinion, it is good to have these models under coronet-community so that people outside Apple can also contribute to it (similar to MLX community).

I have a suggestion regarding the organization of the models: Currently, they're structured as corenet-community/place365-512x512-vit-huge, which lacks clarity regarding their origin CoreNet project and intended tasks. Perhaps renaming them (e.g., corenet-community/catlip/image_classification/place365-512x512-vit-huge) and mirroring the structure in the CoreNet projects folder would enhance user understanding. This adjustment would also make it easier for future research efforts focusing on improving specific models (say ViT) on a specific task (say image classification on Places365). What do you think?

pcuenca · 2024-05-01T16:38:29Z

Thanks for the comments @sacmehta!

Hub repositories do not support arbitrary hierarchy. Similar to GitHub, they are structured as a namespace (corenet-community in this case), and then a flat list of repos under that namespace. We could potentially create a repo per task and place all models for that task in the same repo. In our experience, however, we've found that this is more confusing for users, leads to worse discoverability and makes it more difficult for you to collect usage stats. In general, we recommend the one-model-per-repo approach.

I do understand your sentiment that something like corenet-community/vit-large is a bit too opaque. Part of the problem could be solved by populating all the model cards with tags and other searchable metadata fields. For example, we could potentially have a corenet library name just as there's an MLX one. As another example, see how a model like mlx-community/Llama-3-8B-Instruct-1048k-8bit also contains metadata for the task it supports (Text Generation), the file format (Safetensors), language, and other details. We could also use longer names (instead of just vit-large) for additional clarity.

The following practices can also be used to communicate your intended model structure:

An organization card, where you can add tables with the different model families and links to the individual models. For example, the meta-llama organization simply enumerates the model families, but we could be much more detailed in the case of CoreNet.
The use of collections to group related models together. For example, I created a collection for the Core ML versions of OpenELM when I uploaded them, and the official Apple organization has a few collections set up.

As per the target organization, I think it depends on the message you want to convey and the goals you have. Using the apple org sends the message that these are Apple-sanctioned, official assets. Using the community approach is perfectly fine, and beneficial if you expect or encourage community contributions, as you said. What type of contributions do you expect from the community? We can set up a process where people can easily get accepted, similar to how it works for the MLX community, and we can help communicate your goals to incentivize engagement.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CatLIP checkpoints on the hub 🤗 #1

CatLIP checkpoints on the hub 🤗 #1

Vaibhavs10 commented Apr 24, 2024

sacmehta commented Apr 24, 2024

pcuenca commented Apr 25, 2024

Vaibhavs10 commented Apr 30, 2024

sacmehta commented May 1, 2024

pcuenca commented May 1, 2024

CatLIP checkpoints on the hub 🤗 #1

CatLIP checkpoints on the hub 🤗 #1

Comments

Vaibhavs10 commented Apr 24, 2024

sacmehta commented Apr 24, 2024

pcuenca commented Apr 25, 2024

Vaibhavs10 commented Apr 30, 2024

sacmehta commented May 1, 2024

pcuenca commented May 1, 2024