Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: Add Upstash Embeddings Support #5150

Merged
merged 19 commits into from
May 25, 2024

Conversation

fahreddinozcan
Copy link
Contributor

@fahreddinozcan fahreddinozcan commented Apr 19, 2024

This PR adds Upstash Embeddings support, which allows users to send their data to Upstash Vector Database without creating the embeddings. Upstash Embeddings will handle the embedding operation for them.

More context on Upstash announcement

Copy link

vercel bot commented Apr 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchainjs-api-refs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 25, 2024 9:55pm
langchainjs-docs ✅ Ready (Inspect) Visit Preview May 25, 2024 9:55pm

@fahreddinozcan fahreddinozcan marked this pull request as ready for review April 24, 2024 13:54
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Apr 24, 2024
@@ -203,7 +272,7 @@ export class UpstashVectorStore extends VectorStore {
static async fromTexts(
texts: string[],
metadatas: UpstashMetadata | UpstashMetadata[],
embeddings: EmbeddingsInterface,
embeddings: EmbeddingsInterface | "UpstashEmbeddings",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mean instead of passing a string here, pass FakeEmbeddings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, passing something explicit like "UpstashEmbeddings" makes it easier to understand you have the option to either get the embeddings from Upstash, or passing a embedding client to create the embeddings your self. In the Vectara case, it seems that there's no option to pass an embedding, so all the embeddings will be created directly by it self. With Upstash, this is optional. Sorry for the delay here.

@jacoblee93
Copy link
Collaborator

Sorry for the delay!

@@ -203,7 +272,7 @@ export class UpstashVectorStore extends VectorStore {
static async fromTexts(
texts: string[],
metadatas: UpstashMetadata | UpstashMetadata[],
embeddings: EmbeddingsInterface,
embeddings: EmbeddingsInterface | "UpstashEmbeddings",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I mean instead of passing a string here, pass FakeEmbeddings

@buggyhunter
Copy link

Hey @jacoblee93 any chance for review again?

@jacoblee93
Copy link
Collaborator

Hey @jacoblee93 any chance for review again?

Hey, sorry for the delay!

I left a note about using FakeEmbeddings and documenting it rather than the string. While I do agree that a string is more descriptive, we must adhere to the base class interface for now otherwise we'll have problems with types etc. We have some refactoring of vectorstores planned in the future and we can apply a better fix then.

@fahreddinozcan
Copy link
Contributor Author

Hey @jacoblee93 any chance for review again?

Hey, sorry for the delay!

I left a note about using FakeEmbeddings and documenting it rather than the string. While I do agree that a string is more descriptive, we must adhere to the base class interface for now otherwise we'll have problems with types etc. We have some refactoring of vectorstores planned in the future and we can apply a better fix then.

Hey, it's ready as you asked.

@@ -78,10 +96,14 @@ export class UpstashVectorStore extends VectorStore {
*/
async addDocuments(
documents: DocumentInterface[],
options?: { ids?: string[] }
options?: { ids?: string[]; UpstashEmbeddings?: boolean }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small style nit - we generally use camel case for properties. I can push a rename

@jacoblee93 jacoblee93 changed the title Add Upstash Embeddings Support community[minor]: Add Upstash Embeddings Support May 25, 2024
@@ -45,22 +50,35 @@ const CONCURRENT_UPSERT_LIMIT = 1000;
export class UpstashVectorStore extends VectorStore {
declare FilterType: string;

declare embeddings: EmbeddingsInterface;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make this an optional prop

index: UpstashIndex;

caller: AsyncCaller;

embeddings: EmbeddingsInterface;
upstashEmbeddingsConfig?: boolean;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming to useUpstashEmbeddings instead

* @param options Optional object containing the array of ids for the documents.
* @returns Promise that resolves with the ids of the provided documents when the upsert operation is done.
*/
async addData(documents: DocumentInterface[], options?: { ids?: string[] }) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to make this protected since I think redundant vs. just calling addDocuments

@jacoblee93
Copy link
Collaborator

Thank you! I renamed a couple of things but re-ran the integration tests and they passed. Will get this released today.

@jacoblee93 jacoblee93 merged commit 668b0bb into langchain-ai:main May 25, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PRs that are ready to be merged as-is size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants