Send document.metadata to AzureAISearch in addVectors() #5274

keiohtani · 2024-05-02T21:32:16Z

Fixes the issue with document.metadata not being sent to Azure AI Search when using AzureAISearchVectorStore.fromDocuments().
AzureAISearchVectorStore.fromDocuments() accepts Document[] as argument which has a parameter document.metadata, but internally this value is not passed to AzureAISearchVectorStore.addVectors() and it could be confusing to the users.

vercel · 2024-05-02T21:32:20Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchainjs-api-refs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 2, 2024 9:38pm
langchainjs-docs	✅ Ready (Inspect)	Visit Preview		May 2, 2024 9:38pm

sinedied · 2024-05-03T21:49:50Z

libs/langchain-community/src/vectorstores/azure_aisearch.ts

@@ -314,6 +314,7 @@ export class AzureAISearchVectorStore extends VectorStore {
      metadata: {
        source: doc.metadata?.source,
        attributes: doc.metadata?.attributes ?? [],
+        ...doc.metadata,


This won't work and most probably triggers an error.
AFAIK Azure AI Search only supports a fixed schema (see index creation), hence the "workaround" of setting metadata as K/V pair in the generic attributes property.

This is something I plan to fix in a later PR by doing the K/V pair transform automatically, and allowing to customize the index schema.

Yes, that’s correct. I think we can update the LangChain JS doc to provide instructions on what to do in that case. Actually to use the AzureAISearchVectorStore class, I had to create source and attributes fields in the metadata field anyways, otherwise the fromDocuments() method was failing. I am happy to make those changes in the doc as well if you are ok with it.

If you don't provide source or attributes it should work anyways as there's a fallback, do you remember what kind of error you had? Metadata should not be mandatory, if there's an issue with empty metadata I'll be happy to fix it

Sure,

This is the error message when a metadata field is defined in the index,

} Error: Azure AI Search uploadDocuments batch failed: RestError: The request is invalid. Details: Cannot find nested property 'metadata' on the resource type 'search.documentFields'. at EventEmitter.<anonymous> (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:225:19) at EventEmitter.emit (node:events:514:28) at SearchIndexingBufferedSender.submitDocuments (/home/keiohtani/projects/chatifc-backend/node_modules/@azure/search-documents/src/searchIndexingBufferedSender.ts:463:22) at processTicksAndRejections (node:internal/process/task_queues:95:5) at SearchIndexingBufferedSender.internalFlush (/home/keiohtani/projects/chatifc-backend/node_modules/@azure/search-documents/src/searchIndexingBufferedSender.ts:400:9) at AzureAISearchVectorStore.addVectors (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:228:9) at AzureAISearchVectorStore.addDocuments (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:201:25) at Function.fromDocuments (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:576:9) at async Promise.all (index 0) at SummarizationChainService.createEmbeddings (/home/keiohtani/projects/chatifc-backend/src/services/LangChain/SummarizationChainService.ts:94:23)

And Azure AI Search does not allow an empty metadata field when creating from LangChain (in this case, we are using Python)

Also, looking at the other method, fromTexts, The way metadatas is handled is confusing.

static async fromTexts( texts: string[], metadatas: object[] | object, embeddings: EmbeddingsInterface, config: AzureAISearchConfig ): Promise<AzureAISearchVectorStore> { const docs: Document<AzureAISearchDocumentMetadata>[] = []; for (let i = 0; i < texts.length; i += 1) { const metadata = Array.isArray(metadatas) ? metadatas[i] : metadatas; const newDoc = new Document({ pageContent: texts[i], metadata, }); docs.push(newDoc); } return AzureAISearchVectorStore.fromDocuments(docs, embeddings, config); }

At first glance, it seems this metadatas parameter will be used in the document creation, but because it's overwritten in the addVectors(), that's not the case and it's really hard to understand the behavior by just looking at the method itself.

@sinedied Following up on this

For now, if you need metadata to be stored you need to use these two properties: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/azure_aisearch.ts#L64-L65

source and attributes, the latter being able to store a collection of objects that have both a keyandvalue` attributes, see the schema here: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/azure_aisearch.ts#L712-L723

It's not ideal I know, that's why I would like to propose a better implementation for this (the current one was based on the Python implementation). I'll probably work on it starting June, with the current plan:

Automatically take any metadata properties passed in and adapt them to fit in the current schema. That would solve your current issue, but it still would allow only string values.

Propose a way to customize the schema and the attributes used for storage.

Update azure_aisearch.ts

6df2212

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. auto:bug Related to a bug, vulnerability, unexpected error with an existing feature labels May 2, 2024

vercel bot deployed to Preview – langchainjs-docs May 2, 2024 21:38 View deployment

vercel bot deployed to Preview – langchainjs-api-refs May 2, 2024 21:38 View deployment

sinedied reviewed May 3, 2024

View reviewed changes

bracesproul added the hold On hold label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send document.metadata to AzureAISearch in addVectors() #5274

Send document.metadata to AzureAISearch in addVectors() #5274

keiohtani commented May 2, 2024

vercel bot commented May 2, 2024 •

edited

sinedied May 3, 2024 •

edited

keiohtani May 5, 2024

sinedied May 6, 2024

keiohtani May 6, 2024

keiohtani May 6, 2024

keiohtani May 6, 2024

keiohtani May 21, 2024

sinedied May 23, 2024

Send document.metadata to AzureAISearch in addVectors() #5274

Are you sure you want to change the base?

Send document.metadata to AzureAISearch in addVectors() #5274

Conversation

keiohtani commented May 2, 2024

vercel bot commented May 2, 2024 • edited

sinedied May 3, 2024 • edited

Choose a reason for hiding this comment

keiohtani May 5, 2024

Choose a reason for hiding this comment

sinedied May 6, 2024

Choose a reason for hiding this comment

keiohtani May 6, 2024

Choose a reason for hiding this comment

keiohtani May 6, 2024

Choose a reason for hiding this comment

keiohtani May 6, 2024

Choose a reason for hiding this comment

keiohtani May 21, 2024

Choose a reason for hiding this comment

sinedied May 23, 2024

Choose a reason for hiding this comment

vercel bot commented May 2, 2024 •

edited

sinedied May 3, 2024 •

edited