Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different embeddings obtained when running with different batch size #68

Closed
wufeim opened this issue May 15, 2024 · 2 comments
Closed

Comments

@wufeim
Copy link

wufeim commented May 15, 2024

Thanks for sharing this awesome work.

I'm trying a simple symmetric text retrieval demo, which involves computing text embeddings for text retrieval. What I don't understand is why I get different embeddings when I run l2v with one caption or multiple captions:

sentences = [
    "how much protein should a female eat",
    "summit define",
    "As a general guideline",
    "Definition of summit for English Language Learners"]
print(l2v.encode(sentences[0:1])[0:1, :10])
print(l2v.encode(sentences[0:2])[0:1, :10])
print(l2v.encode(sentences[0:3])[0:1, :10])
print(l2v.encode(sentences[0:4])[0:1, :10])

The print statements would always output the first 10 out of 4096 values of the first caption's embedding. I expect all print statements outputing the same values but actually they don't. Am I misunderstanding something here?

Thanks for your help!

@vaibhavad
Copy link
Collaborator

Hi @wufeim,

Thank you for your interest in our work and raising this issue. While exploring this issue we actually uncovered a bug in our code (#74 ).

Firstly, the output of batch size 1 is very different from output of batch size >1 because of a bug in the implementation of bidirectional_llama.py. I have pushed a fix (#75 ). If you plan to use batch size of 1 while using encode, you should build lm2vec package from source using the latest changes. The bug does not impact when the batch size is greater than 1.

Regarding the output being different for batch size 2,3,4 - this is a know issue of transformers library. Basically, this happens due to accumulation of matrix multiplication errors, which is more pronounced in lower precision like bf16. Here is a detailed explanation by one of the maintainers of transformers library.

By changing the precision of fp32, the variability with batch size will reduce quite a lot but will never be zero (see tests run in the above detailed explanation)

Here are more related issues on the transformers library for reference:
huggingface/transformers#26869
huggingface/transformers#27626

Another on llama.cpp
ggerganov/llama.cpp#3014

Hope this answers your question, let me know if you have any more queries.

@vaibhavad
Copy link
Collaborator

Closing as it is stale, feel free to re-open if the issue persists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants