Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLamaCpp embedding returns an empty array for long text(While HuggingFaceEmbeddings works fine) #6996

Open
mokeyish opened this issue Apr 30, 2024 · 0 comments

Comments

@mokeyish
Copy link

mokeyish commented Apr 30, 2024

LLamaCpp embedding returns an empty array for long text. It seems that this problem will occur if the text length exceeds 680.

Steps to reproduce:

  1. Download gguf from https://huggingface.co/CompendiumLabs/bge-large-zh-v1.5-gguf/blob/main/bge-large-zh-v1.5-f32.gguf
  2. Launch with docker run -p 10080:80 -v $(pwd):/ggufs/ --rm ghcr.io/ggerganov/llama.cpp:server -m /ggufs/bge-large-zh-v1.5-f32.gguf --embedding -c 8192 --host 0.0.0.0 --port 80 -a bge-large-zh -ngl 100
  3. Query with text length over 680. (Log print n_past=0)

Note: I can't find a parameter that can adjust this limit.

It's fine if you use HuggingFaceEmbeddings(The downside is that it's too bulky).

from typing import Any, Dict, List, Optional, Union
import fire
import os
import uvicorn
import subprocess
from fastapi import FastAPI, Request
from pydantic import BaseModel
from langchain_community.embeddings import HuggingFaceEmbeddings


app = FastAPI()

class UsageInfo(BaseModel):
    """Usage information."""
    prompt_tokens: int = 0
    total_tokens: int = 0
    completion_tokens: Optional[int] = 0

class EmbeddingsRequest(BaseModel):
    """Embedding request."""
    model: str = None
    input: Union[str, List[Any]]
    user: Optional[str] = None


class EmbeddingsResponse(BaseModel):
    """Embedding response."""
    object: str = 'list'
    data: List[Dict[str, Any]]
    model: str
    usage: UsageInfo

embeddings: Optional[HuggingFaceEmbeddings] = None


@app.post('/v1/embeddings')
async def create_embeddings(request: EmbeddingsRequest,
                            raw_request: Request = None):
    """Creates embeddings for the text."""
    embedding = await embeddings.aembed_query(request.input)
    data = [{'object': 'embedding', 'embedding': embedding, 'index': 0}]
    token_num = len(embedding)
    return EmbeddingsResponse(
        data=data,
        model=request.model,
        usage=UsageInfo(
            prompt_tokens=token_num,
            total_tokens=token_num,
            completion_tokens=None,
        ),
    ).dict(exclude_none=True)


def main(model_name: str, host="0.0.0.0", port=8966, **kwargs: Dict[str, Any]):
    embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=kwargs)

    uvicorn.run(app, host=host, port=port, workers=1)

if __name__ == "__main__":
    fire.Fire(main)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant