-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send document.metadata to AzureAISearch in addVectors() #5274
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -314,6 +314,7 @@ export class AzureAISearchVectorStore extends VectorStore { | |||
metadata: { | |||
source: doc.metadata?.source, | |||
attributes: doc.metadata?.attributes ?? [], | |||
...doc.metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work and most probably triggers an error.
AFAIK Azure AI Search only supports a fixed schema (see index creation), hence the "workaround" of setting metadata as K/V pair in the generic attributes
property.
This is something I plan to fix in a later PR by doing the K/V pair transform automatically, and allowing to customize the index schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s correct. I think we can update the LangChain JS doc to provide instructions on what to do in that case. Actually to use the AzureAISearchVectorStore
class, I had to create source
and attributes
fields in the metadata field anyways, otherwise the fromDocuments()
method was failing. I am happy to make those changes in the doc as well if you are ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't provide source
or attributes
it should work anyways as there's a fallback, do you remember what kind of error you had? Metadata should not be mandatory, if there's an issue with empty metadata I'll be happy to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure,
This is the error message when a metadata field is defined in the index,
}
Error: Azure AI Search uploadDocuments batch failed: RestError: The request is invalid. Details: Cannot find nested property 'metadata' on the resource type 'search.documentFields'.
at EventEmitter.<anonymous> (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:225:19)
at EventEmitter.emit (node:events:514:28)
at SearchIndexingBufferedSender.submitDocuments (/home/keiohtani/projects/chatifc-backend/node_modules/@azure/search-documents/src/searchIndexingBufferedSender.ts:463:22)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at SearchIndexingBufferedSender.internalFlush (/home/keiohtani/projects/chatifc-backend/node_modules/@azure/search-documents/src/searchIndexingBufferedSender.ts:400:9)
at AzureAISearchVectorStore.addVectors (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:228:9)
at AzureAISearchVectorStore.addDocuments (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:201:25)
at Function.fromDocuments (/home/keiohtani/projects/chatifc-backend/node_modules/@langchain/community/dist/vectorstores/azure_aisearch.cjs:576:9)
at async Promise.all (index 0)
at SummarizationChainService.createEmbeddings (/home/keiohtani/projects/chatifc-backend/src/services/LangChain/SummarizationChainService.ts:94:23)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looking at the other method, fromTexts
, The way metadatas
is handled is confusing.
static async fromTexts(
texts: string[],
metadatas: object[] | object,
embeddings: EmbeddingsInterface,
config: AzureAISearchConfig
): Promise<AzureAISearchVectorStore> {
const docs: Document<AzureAISearchDocumentMetadata>[] = [];
for (let i = 0; i < texts.length; i += 1) {
const metadata = Array.isArray(metadatas) ? metadatas[i] : metadatas;
const newDoc = new Document({
pageContent: texts[i],
metadata,
});
docs.push(newDoc);
}
return AzureAISearchVectorStore.fromDocuments(docs, embeddings, config);
}
At first glance, it seems this metadatas
parameter will be used in the document creation, but because it's overwritten in the addVectors()
, that's not the case and it's really hard to understand the behavior by just looking at the method itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sinedied Following up on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, if you need metadata to be stored you need to use these two properties: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/azure_aisearch.ts#L64-L65
source
andattributes, the latter being able to store a collection of objects that have both a
keyand
value` attributes, see the schema here: https://github.com/langchain-ai/langchainjs/blob/main/libs/langchain-community/src/vectorstores/azure_aisearch.ts#L712-L723
It's not ideal I know, that's why I would like to propose a better implementation for this (the current one was based on the Python implementation). I'll probably work on it starting June, with the current plan:
- Automatically take any metadata properties passed in and adapt them to fit in the current schema. That would solve your current issue, but it still would allow only string values.
- Propose a way to customize the schema and the attributes used for storage.
Fixes the issue with
document.metadata
not being sent to Azure AI Search when usingAzureAISearchVectorStore.fromDocuments()
.AzureAISearchVectorStore.fromDocuments()
acceptsDocument[]
as argument which has a parameterdocument.metadata
, but internally this value is not passed toAzureAISearchVectorStore.addVectors()
and it could be confusing to the users.