Different embeddings obtained when running with different batch size #68

wufeim · 2024-05-15T01:54:14Z

Thanks for sharing this awesome work.

I'm trying a simple symmetric text retrieval demo, which involves computing text embeddings for text retrieval. What I don't understand is why I get different embeddings when I run l2v with one caption or multiple captions:

sentences = [
    "how much protein should a female eat",
    "summit define",
    "As a general guideline",
    "Definition of summit for English Language Learners"]
print(l2v.encode(sentences[0:1])[0:1, :10])
print(l2v.encode(sentences[0:2])[0:1, :10])
print(l2v.encode(sentences[0:3])[0:1, :10])
print(l2v.encode(sentences[0:4])[0:1, :10])

The print statements would always output the first 10 out of 4096 values of the first caption's embedding. I expect all print statements outputing the same values but actually they don't. Am I misunderstanding something here?

Thanks for your help!

The text was updated successfully, but these errors were encountered:

vaibhavad · 2024-05-16T20:57:31Z

Hi @wufeim,

Thank you for your interest in our work and raising this issue. While exploring this issue we actually uncovered a bug in our code (#74 ).

Firstly, the output of batch size 1 is very different from output of batch size >1 because of a bug in the implementation of bidirectional_llama.py. I have pushed a fix (#75 ). If you plan to use batch size of 1 while using encode, you should build lm2vec package from source using the latest changes. The bug does not impact when the batch size is greater than 1.

Regarding the output being different for batch size 2,3,4 - this is a know issue of transformers library. Basically, this happens due to accumulation of matrix multiplication errors, which is more pronounced in lower precision like bf16. Here is a detailed explanation by one of the maintainers of transformers library.

By changing the precision of fp32, the variability with batch size will reduce quite a lot but will never be zero (see tests run in the above detailed explanation)

Here are more related issues on the transformers library for reference:
huggingface/transformers#26869
huggingface/transformers#27626

Another on llama.cpp
ggerganov/llama.cpp#3014

Hope this answers your question, let me know if you have any more queries.

vaibhavad · 2024-05-21T22:15:04Z

Closing as it is stale, feel free to re-open if the issue persists

vaibhavad mentioned this issue May 16, 2024

For Llama models, bidirectional connections are not enabled when batch size is 1 or no padding token in batch #74

Closed

vaibhavad mentioned this issue May 20, 2024

eager / sdpa attention #79

Closed

vaibhavad closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different embeddings obtained when running with different batch size #68

Different embeddings obtained when running with different batch size #68

wufeim commented May 15, 2024

vaibhavad commented May 16, 2024

vaibhavad commented May 21, 2024

Different embeddings obtained when running with different batch size #68

Different embeddings obtained when running with different batch size #68

Comments

wufeim commented May 15, 2024

vaibhavad commented May 16, 2024

vaibhavad commented May 21, 2024