How are the base model weights loaded into llm2vec encoder model? #63

xiaoyuqian2 · 2024-05-09T20:53:31Z

When running the code snippet below,

model = AutoModel.from_pretrained(
    "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp",
    trust_remote_code=True,
    config=config,
    torch_dtype=torch.bfloat16,
)

the original model (in this case, Llama-2-7b) will be downloaded automatically. I'm trying to demystify the automation here.

It looks like the _name_or_path parameter config.json is not used anywhere in modeling_llama_encoder.py. The llama weights seems are loaded when running self.post_init(). Is my understanding correct? I'm not so sure how exactly the weights are loaded into LlamaEncoderModel though. I'm guessing it's based on weight names? Would appreciate a lot if you could help me dive deep and understand it better. Thank you!

The text was updated successfully, but these errors were encountered:

vaibhavad · 2024-05-13T21:29:36Z

Hi @xiaoyuqian2,

Thanks for your interest in our work. I am not fully familiar with all details of huggingface model but I'll try to explain the best to my understanding.

LlamaEncoderModel is sub-classed from LlamaModel, hence they share the model loading code. Your understanding is correct that weight loading happens when running self.post_init(). Here, it calls the post_init of LlamaModel as it is the parent class of LlamaEncoderModel and LlamaEncoderModel itself dow not implement this method.

As LlamaEncoderModel shares all the model weight names with LlamaModel, the loading based on weight names works as expected. This can be verified by printing a few weight values of LlamaModel and LlamaEncoderModel.

I tried to deep-dive into transformers library code to find where exactly weights are being loaded but so far I haven't been successful. Please let me know if you are able to find the exact code snippet.

vaibhavad · 2024-05-21T22:14:32Z

Closing as it is stale. Feel free to re-open if you have any more questions.

xiaoyuqian2 changed the title ~~How are the pretrained model weights loaded into llm2vec encoder model?~~ How are the base model weights loaded into llm2vec encoder model? May 9, 2024

vaibhavad closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are the base model weights loaded into llm2vec encoder model? #63

How are the base model weights loaded into llm2vec encoder model? #63

xiaoyuqian2 commented May 9, 2024 •

edited

vaibhavad commented May 13, 2024

vaibhavad commented May 21, 2024

How are the base model weights loaded into llm2vec encoder model? #63

How are the base model weights loaded into llm2vec encoder model? #63

Comments

xiaoyuqian2 commented May 9, 2024 • edited

vaibhavad commented May 13, 2024

vaibhavad commented May 21, 2024

xiaoyuqian2 commented May 9, 2024 •

edited