Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are the base model weights loaded into llm2vec encoder model? #63

Closed
xiaoyuqian2 opened this issue May 9, 2024 · 2 comments
Closed

Comments

@xiaoyuqian2
Copy link

xiaoyuqian2 commented May 9, 2024

When running the code snippet below,

model = AutoModel.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
    trust_remote_code=True,
    config=config,
    torch_dtype=torch.bfloat16,
)

the original model (in this case, Llama-2-7b) will be downloaded automatically. I'm trying to demystify the automation here.

It looks like the _name_or_path parameter config.json is not used anywhere in modeling_llama_encoder.py. The llama weights seems are loaded when running self.post_init(). Is my understanding correct? I'm not so sure how exactly the weights are loaded into LlamaEncoderModel though. I'm guessing it's based on weight names? Would appreciate a lot if you could help me dive deep and understand it better. Thank you!

@xiaoyuqian2 xiaoyuqian2 changed the title How are the pretrained model weights loaded into llm2vec encoder model? How are the base model weights loaded into llm2vec encoder model? May 9, 2024
@vaibhavad
Copy link
Collaborator

Hi @xiaoyuqian2,

Thanks for your interest in our work. I am not fully familiar with all details of huggingface model but I'll try to explain the best to my understanding.

LlamaEncoderModel is sub-classed from LlamaModel, hence they share the model loading code. Your understanding is correct that weight loading happens when running self.post_init(). Here, it calls the post_init of LlamaModel as it is the parent class of LlamaEncoderModel and LlamaEncoderModel itself dow not implement this method.

As LlamaEncoderModel shares all the model weight names with LlamaModel, the loading based on weight names works as expected. This can be verified by printing a few weight values of LlamaModel and LlamaEncoderModel.

I tried to deep-dive into transformers library code to find where exactly weights are being loaded but so far I haven't been successful. Please let me know if you are able to find the exact code snippet.

@vaibhavad
Copy link
Collaborator

Closing as it is stale. Feel free to re-open if you have any more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants