Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using AzureOpenAIEmbeddings throws input string is not valid when trying to embed a string #21575

Open
5 tasks done
Govindarajan-D opened this issue May 12, 2024 · 7 comments
Open
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module 🔌: openai Primarily related to OpenAI integrations

Comments

@Govindarajan-D
Copy link

Govindarajan-D commented May 12, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

I have imported the langchain library for embeddings
from langchain_openai.embeddings import AzureOpenAIEmbeddings

And then built the embedding model like below:

embedding_model = AzureOpenAIEmbeddings(
    azure_endpoint= AOAI_ENDPOINT,
    openai_api_key = AOAI_KEY
) 

When I try to run a simple _token, it succeeds
print(embedding_model._tokenize(["Test","Message"],2048))

But if I try to embed a query, it throws an error saying 'Input should be a valid string'
print(embedding_model.embed_query("Test Message"))

Error Message and Stack Trace (if applicable)

Traceback (most recent call last):
File "c:\Users\govindarajand\backend-llm-model\stock_model\embed-test.py", line 55, in
print(embedding_model.embed_query("Test Message"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_openai\embeddings\base.py", line 530, in e
mbed_query
return self.embed_documents([text])[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_openai\embeddings\base.py", line 489, in e
mbed_documents
return self._get_len_safe_embeddings(texts, engine=engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\langchain_openai\embeddings\base.py", line 347, in _
get_len_safe_embeddings
response = self.client.create(
^^^^^^^^^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai\resources\embeddings.py", line 114, in create

return self._post(
       ^^^^^^^^^^^

File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1240, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 921, in request
return self._request(
^^^^^^^^^^^^^^
File "c:\Users\govindarajand\AppData\Local\Programs\Python\Python311\Lib\site-packages\openai_base_client.py", line 1020, in _request
raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'detail': [{'type': 'string_type', 'loc': ['body', 'input', 'str'], 'msg': 'Input should
be a valid string', 'input': [[2323, 4961]]}, {'type': 'string_type', 'loc': ['body', 'input', 'list[str]', 0], 'msg': 'Input should be a val
id string', 'input': [2323, 4961]}]}

Description

I am trying to use langchain_openai.embeddings - AzureOpenAIEmbeddings. But I get an error when trying to embed even a simple string. I was trying to use the embedding_model with Vector Search but was getting an error and after some few hours of debugging I found that the embedding_model was having issue.

I tried to then figure out if it is an issue in the code, so I put the embedding code in the most simplest format and then tried to run it but still got error.

System Info

langchain==0.0.352
langchain-community==0.0.20
langchain-core==0.1.52
langchain-openai==0.1.6

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🔌: openai Primarily related to OpenAI integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 12, 2024
@liugddx
Copy link
Contributor

liugddx commented May 13, 2024

Let me see.

@jackbullen
Copy link

Did you deploy your embedding endpoint in Azure? If not then try that.

I don't think this is an issue with langchain.

@Govindarajan-D
Copy link
Author

@jackbullen. Yes it is deployed. I am converting an existing code for running using langchain.

In my existing model, I have used AzureOpenAI from openai library and used embeddings.create and it works without issue.
Even in my issue description, I have mentioned that __tokenize works in langchain but not "embed_query"

@jackbullen
Copy link

Yes it is deployed.

OK, I asked did you deploy it.

If you did and it's still giving the same error then idk, but a temporary workaround is passing the string query rather than tokens into OpenAIEmbeddings.client.create

@tindo2003
Copy link

Hi, any updates since? Thanks!

@Govindarajan-D
Copy link
Author

Yes it is deployed.

OK, I asked did you deploy it.

If you did and it's still giving the same error then idk, but a temporary workaround is passing the string query rather than tokens into OpenAIEmbeddings.client.create

Sorry about that, but I did not deploy the Open AI model in Azure, but I am using it with API Key. Embedding works when I use AzureOpenAI.embeddings.create but only fails when I use langchain

@tindo2003
Copy link

I am getting same issue. Perhaps, the deployment is different from the model name? As indicated by #1560

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module 🔌: openai Primarily related to OpenAI integrations
Projects
None yet
Development

No branches or pull requests

4 participants