-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvertedIndexImmutableRam and index migrations #4220
Conversation
058617e
to
6ed76b4
Compare
|
||
fn migrate_from_v1(path: &Path) -> FileOperationResult<()> { | ||
// Disable migration for now. | ||
SparseVectorIndexVersion::save(path)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aware, that if you keep it like this and we merge to dev, then the actual migration should be v2 -> v3, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current state of this PR, only a file named version.info
with contents 0.1.0
is created. No file with a v2
in its name is created.
To enable the actual migration, we'll revert the last commit of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, can't we reuse the same file path and rely on the surrounding version number of contents to tell us whether it has been migrated. Or do you deem that as not reliable enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No file with a v2 in its name is created.
Yes, but once implemented, the next migration might possibly see either no version file or 0.1.0
.
I am ok with this approach, but please keep in mind that it is possible, that we will release another patch version in between your PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we reuse the same file path and rely on the surrounding version
It might be potentially corruptive, if the process will be killed in between the migration.
|
||
Ok(inverted_index) | ||
fn open(_path: &Path) -> std::io::Result<Self> { | ||
panic!("InvertedIndexRam is not supposed to be loaded"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation behind making it non-loadable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this PR, InvertedIndexRam
was used both for mutable and immutable cases.
For the mutable case, it was never loaded:
qdrant/lib/segment/src/index/sparse_index/sparse_vector_index.rs
Lines 95 to 105 in c230a48
let (config, inverted_index, indices_tracker) = if is_appendable { | |
// RAM mutable case - build inverted index from scratch and use provided config | |
let (inverted_index, indices_tracker) = Self::build_inverted_index( | |
id_tracker.clone(), | |
vector_storage.clone(), | |
path, | |
stopped, | |
|| (), | |
)?; | |
(config, inverted_index, indices_tracker) | |
} else if config_path.exists() { |
And never saved:
qdrant/lib/segment/src/index/sparse_index/sparse_vector_index.rs
Lines 492 to 496 in c230a48
// save inverted index | |
if !self.is_appendable { | |
self.indices_tracker.save(&self.path)?; | |
self.inverted_index.save(&self.path)?; | |
} |
And immutable case was not supporting upsert:
qdrant/lib/segment/src/index/sparse_index/sparse_vector_index.rs
Lines 530 to 534 in c230a48
if !self.is_appendable { | |
return Err(OperationError::service_error( | |
"Cannot update vector in non-appendable index", | |
)); | |
} |
In this PR I made this distinction more clear:
InvertedIndexRam
: Mutable case, supports upserts, but no load/save.InvertedIndexImmutableRam
: Immutable case, supports load/save, but no upserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, I don't see a problem why Mutable index should not be loaded. But in our scenarios it is, indeed, never used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that load&save is invalid here. My question is about design. Can you redesign InvertedIndex
and remove save
and load
functions from trait to avoid unnecessary panic here?
I propose to do the same as vector storage: constructor is a separate function wich know stoarge/index type, ref:
https://github.com/qdrant/qdrant/blob/dev/lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs#L41
6ed76b4
to
078cad1
Compare
078cad1
to
c5edbc8
Compare
* Move StorageVersion from segment crate to common/io * Refine StorageVersion API * Move methods from SparseVectorDataConfig to enum SparseIndexType * Introduce InvertedIndexImmutableRam * Add migrate * Don't migrate
Required for #4143.
This PR splits
InvertedIndexRam
intoInvertedIndexRam
andInvertedIndexImmutableRam
. The behavior/index format is not changed.Additionally, this PR introduces (and reverts in the last commit) the migration mechanism.
It is partially reverted in this PR because currently the index format is not changed yet and there is no point of migration. You could remove the last commit to check the simple migration process that just moves
inverted_index.data
toinverted_index_v2.data
as is.New sparse index directory contents after migration is applied.