Replies: 1 comment
-
Based on the current implementation of LlamaIndex, there isn't a built-in feature that stores the exact SentenceTransformer model used for generating embeddings at the time of writing to the database. The metadata indicating For managing and storing data embeddings, considering the frequent release of new embeddings, here are some strategies you could consider:
In terms of storing the embedding type at the time of indexing, you could consider storing the embedding name in the collection metadata. This would allow you to easily identify the embedding type used for each collection. However, this approach would require careful management of the collection metadata to ensure it remains accurate and up-to-date. As for avoiding redundant document records, using a single Please note that these strategies would require additional code and systems for versioning, metadata management, and automation, which are not covered in the current LlamaIndex implementation. You might need to extend the existing codebase or integrate with other systems to implement these strategies. I hope this helps! If you have any further questions, feel free to ask. Sources
|
Beta Was this translation helpful? Give feedback.
-
I had an issue where I saved my embeddings and document in ChromaDB using a SentenceTranaformer model. The issue is that I lost track of the exact model used at the time. So when I attempted to do a semantic search query, my query embedding did not match the document embedding stored in ChromaDB.
I have two questions:
At the time of writing to the database, is the exact document embedding model written stored somewhere for future reference? I looked in the metadata for the doc, but it indicates embedding=Null (what is this field used for anyways?)
Secondly, what is the best practice when managing the stored data embeddings? For example, new embeddings come out all the time, how do we manage storing the embedding type at the time of indexing? Should I have a separate chroma DB collection and store the embedding name in the collection metadata or put it somewhere else? My plan was to have a single doc_id= identifier so I do not have redundant document records. Not sure what the best strategy is here.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions