Inconsistent embeddings between LlamaCppEmbeddings and llama.cpp #21568

r3v1 · 2024-05-11T14:07:08Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_community.embeddings.llamacpp import LlamaCppEmbeddings


model = LlamaCppEmbeddings(
    model_path="models/meta-llama-3-8b-instruct.Q4_K_M.gguf",
    seed=198,
)

print(model.embed_query("Hello world!"))

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
  File "mwe.py", line 9, in <module>
    print(model.embed_query("Hello world!"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv/lib/python3.12/site-packages/langchain_community/embeddings/llamacpp.py", line 129, in embed_query
    return list(map(float, embedding))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: float() argument must be a string or a real number, not 'list'

Description

I have downloaded the Llama-3-8B model from https://huggingface.co/SanctumAI/Meta-Llama-3-8B-Instruct-GGUF and tried to run it in the typical Langchain flow to save the embeddings in a vector store.

However, I found several errors. The first, is that the call to embed_query (or similarly embed_documents) returns the error above. Analyzing the implementation of the method, it turns out that the self.client.embed(text) function returns List[List[float]] instead of List[float]:

def embed_query(self, text: str) -> List[float]:
    embedding = self.client.embed(text).
    return list(map(float, embedding))

So, for the example above, self.client.embed("Hello world") returns as much lists as tokens (4 tokens, so 4 different embeddings):

[
  [3.7253239154815674, -0.7700189352035522, -1.5746108293533325, ...], 
  [-0.5864148736000061, -1.0474858283996582, -0.11403905600309372, ...], 
  [-1.3635257482528687, -2.6822009086608887, 2.7714433670043945, ...],
  [-0.8518956303596497, -2.877943754196167, 0.94314044713974, ...]
]

However, running the same embedding on llama.cpp binary through:

$ ./embedding -m models/meta-llama-3-8b-instruct.Q4_K_M.gguf -p "Hello world" --seed 198
-1.294132, -2.531020, 2.608500, ...

just a single embedding. So:

Is any of the implementation missing some parameterization to match outputs?
Is any LlamaCppEmbeddings wrong implemented?

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Tue, 07 May 2024 21:45:29 +0000
Python Version: 3.12.3 (main, Apr 23 2024, 09:16:07) [GCC 13.2.1 20240417]

Package Information

langchain_core: 0.1.52
langchain: 0.1.20
langchain_community: 0.0.38
langsmith: 0.1.56
langchain_llamacpp: Installed. No version info available.
langchain_openai: 0.1.6
langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

The text was updated successfully, but these errors were encountered:

r3v1 · 2024-05-14T17:07:50Z

Related issue on llama.cpp

Maybe it has to be some pooling_type parameter value in LlamaCpp object. However, the value 1 (mean) returns in error.

GGML_ASSERT: /tmp/pip-install-51da47p5/llama-cpp-python_a140e7d57e1a4b0cb6ef2bf4c1230991/vendor/llama.cpp/llama.cpp:11092: lctx.inp_mean
ptrace: Operación no permitida.
No stack.
The program is not being run.

OKHand-Zy · 2024-05-21T02:50:45Z

I meet same error
Mycode:

from langchain.embeddings import LlamaCppEmbeddings

llm = LlamaCppEmbeddings(model_path="./model/gte-qwen1.5-7b-instruct.Q6_K/gte-qwen1.5-7b-instruct.Q6_K.gguf")

text = "This is a test document."
print(llm.embed_query(text))

Error Message:

llama_print_timings:        load time =      99.15 ms
llama_print_timings:      sample time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings: prompt eval time =      95.23 ms /     8 tokens (   11.90 ms per token,    84.00 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =      99.32 ms /     9 tokens
Traceback (most recent call last):
  File "/root/RAG/ST/st-gte-Qwen1.5-7B-Q6.py", line 19, in <module>
    print(llm.embed_query(text))
  File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/langchain_community/embeddings/llamacpp.py", line 126, in embed_query
    return list(map(float, embedding))
TypeError: float() argument must be a string or a real number, not 'list'

r3v1 · 2024-05-21T18:51:24Z

Finally, this bug is related to abetlen/llama-cpp-python#1288, downgrading to llama-cpp-python==0.2.55 fixes de issue

r3v1 changed the title ~~Inconsisten embeddings between LlamaCppEmbeddings and llama.cpp~~ Inconsistent embeddings between LlamaCppEmbeddings and llama.cpp May 11, 2024

dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 11, 2024

r3v1 closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent embeddings between LlamaCppEmbeddings and llama.cpp #21568

Inconsistent embeddings between LlamaCppEmbeddings and llama.cpp #21568

r3v1 commented May 11, 2024

r3v1 commented May 14, 2024

OKHand-Zy commented May 21, 2024

r3v1 commented May 21, 2024

Inconsistent embeddings between LlamaCppEmbeddings and llama.cpp #21568

Inconsistent embeddings between LlamaCppEmbeddings and llama.cpp #21568

Comments

r3v1 commented May 11, 2024

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

r3v1 commented May 14, 2024

OKHand-Zy commented May 21, 2024

r3v1 commented May 21, 2024