Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity Score Threshold #5340

Open
5 tasks done
yixiangfeng opened this issue May 13, 2024 · 1 comment
Open
5 tasks done

Similarity Score Threshold #5340

yixiangfeng opened this issue May 13, 2024 · 1 comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@yixiangfeng
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { Milvus } from '@langchain/community/vectorstores/milvus';
import { OpenAIEmbeddings } from '@langchain/openai';
(async () => {
    const embeddings = new OpenAIEmbeddings({
        apiKey: process.env.OPEN_AI_API_KEY,
    });
    const vectorStore = await Milvus.fromExistingCollection(embeddings, {
        url: process.env.ZILLIZ_CLOUD_URL,
        username: process.env.ZILLIZ_CLOUD_USERNAME,
        password: process.env.ZILLIZC_LOUD_PASSWORD,
        ssl: true,
        collectionName: 'test',
    });

      const result = await vectorStore.similaritySearchWithScore('Hello,');
      console.log(result);
})();
output:

[
  [
    Document { pageContent: 'Hello,', metadata: [Object] },
    0.000004837587312067626
  ],
  [
    Document { pageContent: 'Hi,', metadata: [Object] },
    0.03748885169625282
  ],
  [
    Document { pageContent: 'Hi.', metadata: [Object] },
    0.11468228697776794
  ],
  [
    Document { pageContent: 'Hi.', metadata: [Object] },
    0.11468228697776794
  ]
]

Error Message and Stack Trace (if applicable)

No response

Description

when i try to use similaritySearchWithScore in Milvus. The the score 0 is similar, 1 is most dissimilar. This not match is your document, when i use ScoreThresholdRetriever to control the soce, the minSimilarityScore is expect score 1 is similar, 0 is most dissimilar. in this case can't use Similarity Score Threshold in Milvus(ZILLIZ_CLOUD)

System Info

yarn version 1.22.19
node version 18.20.2

@langchain/community@0.0.55
@langchain/openai@0.0.28
@langchain/core@0.1.62
langchain@0.1.36

Copy link

dosubot bot commented May 13, 2024

To address the issue with the discrepancy in similarity score interpretation between Milvus and the ScoreThresholdRetriever in LangChain.js, where Milvus interprets a score of 0 as maximum similarity and 1 as maximum dissimilarity, you can adjust the minSimilarityScore parameter in the ScoreThresholdRetriever to align with Milvus's scoring. This involves setting minSimilarityScore to a value close to 0 to include more similar results.

For a more technical solution, consider preprocessing or transforming the scores from Milvus before they are evaluated by the ScoreThresholdRetriever. This could involve modifying the similaritySearchWithScore method to invert the scores, for example, by subtracting them from 1, to align with the expected scoring system of the ScoreThresholdRetriever.

Here's a conceptual example of how you might adjust the score inversion:

const results: [Document, number][] = [];
searchResp.results.forEach((result) => {
  // Invert the similarity score to align with ScoreThresholdRetriever expectations
  const invertedScore = 1 - result.score;
  results.push([new Document(fields), invertedScore]);
});

This approach assumes scores are normalized between 0 and 1. If Milvus uses a different scoring range, you'll need to adjust the inversion logic accordingly.

Alternatively, you could extend or modify the ScoreThresholdRetriever to interpret scores from Milvus correctly, by prioritizing lower scores as indicating higher similarity. This might involve changing the filtering logic within the ScoreThresholdRetriever.

For implementation details, refer to the LangChain.js repository for the ScoreThresholdRetriever and VectorStore classes.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant