-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarity and Understanding around Prefilter performance and Schema design #1286
Comments
The determining factor is how selective is the filter.
We might later implement composite index where you can include certain metadata columns within the ANN index itself. So the character ID could be stored alongside the vector in the index. That would make prefilter a bit faster for that column.
Creating many unique character tables could also work. You'd have to benchmark to see if that's optimal for your use case right now. |
As of 0.4.20, I only saw two example files in the Rust examples directory, and neither of them come close to demonstrating 1 or 2 above. You're going to think I'm being so rude, please excuse me. The issue is a specific conceptual example matching our data, but your response, unfortunately, doesn't point towards examples or code -- and it's kind of filled with jargon I'm trying hard to parse. Is there Nodejs or Python or Rust examples that are near to what I'm asking? For example, is there something out there for performing exact KNN versus ANN? I wish I was an expert at LanceDB, like you, however some clues as to best practices with the library would be helpful.
Very cool! |
Description
Love the use of the library so far, thank you for all your work, I'm currently using LanceDB inside the Godot game engine via the Rust API and its a breeze.
Use Case:
I have multiple characters in an environment, I am storing all their individual memories in a single table.
Setup:
Issue:
Clarification in Docs Needed:
Example:
Table: characters
Character [Paul] plays basketball at 10:20am.
Character [Cody] plays basketball at 10:30am.
empty_cody_memories has no entries because when searching for "basketball," the embedding is close to BOTH Paul and Cody memories, therefore finds Paul as the closest then applies post filter of name == Cody and now I don't see any of Paul's basketball memories.
cody_prefiltered_memories has an entry because first we remove all the non-Cody memories from the table and then we perform our nearest neighbor search, yielding only cody memories related to basketball as expected.
Link
postfilter rust API ref with a small statement on prefilter
The text was updated successfully, but these errors were encountered: