Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvertedIndexImmutableRam and index migrations #4220

Merged
merged 6 commits into from
May 16, 2024
Merged

Conversation

xzfc
Copy link
Contributor

@xzfc xzfc commented May 12, 2024

Required for #4143.

This PR splits InvertedIndexRam into InvertedIndexRam and InvertedIndexImmutableRam. The behavior/index format is not changed.

Additionally, this PR introduces (and reverts in the last commit) the migration mechanism.
It is partially reverted in this PR because currently the index format is not changed yet and there is no point of migration. You could remove the last commit to check the simple migration process that just moves inverted_index.data to inverted_index_v2.data as is.

New sparse index directory contents after migration is applied.

indices_tracker.json
sparse_index_config.json
inverted_index_config.json
inverted_index.data (old index file)
inverted_index_v2.data (new index file)
version.info (new file)

@xzfc xzfc requested a review from generall May 12, 2024 16:03

fn migrate_from_v1(path: &Path) -> FileOperationResult<()> {
// Disable migration for now.
SparseVectorIndexVersion::save(path)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aware, that if you keep it like this and we merge to dev, then the actual migration should be v2 -> v3, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state of this PR, only a file named version.info with contents 0.1.0 is created. No file with a v2 in its name is created.

To enable the actual migration, we'll revert the last commit of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, can't we reuse the same file path and rely on the surrounding version number of contents to tell us whether it has been migrated. Or do you deem that as not reliable enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No file with a v2 in its name is created.

Yes, but once implemented, the next migration might possibly see either no version file or 0.1.0.

I am ok with this approach, but please keep in mind that it is possible, that we will release another patch version in between your PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we reuse the same file path and rely on the surrounding version

It might be potentially corruptive, if the process will be killed in between the migration.


Ok(inverted_index)
fn open(_path: &Path) -> std::io::Result<Self> {
panic!("InvertedIndexRam is not supposed to be loaded");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind making it non-loadable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this PR, InvertedIndexRam was used both for mutable and immutable cases.

For the mutable case, it was never loaded:

let (config, inverted_index, indices_tracker) = if is_appendable {
// RAM mutable case - build inverted index from scratch and use provided config
let (inverted_index, indices_tracker) = Self::build_inverted_index(
id_tracker.clone(),
vector_storage.clone(),
path,
stopped,
|| (),
)?;
(config, inverted_index, indices_tracker)
} else if config_path.exists() {

And never saved:

// save inverted index
if !self.is_appendable {
self.indices_tracker.save(&self.path)?;
self.inverted_index.save(&self.path)?;
}

And immutable case was not supporting upsert:

if !self.is_appendable {
return Err(OperationError::service_error(
"Cannot update vector in non-appendable index",
));
}


In this PR I made this distinction more clear:

  • InvertedIndexRam: Mutable case, supports upserts, but no load/save.
  • InvertedIndexImmutableRam: Immutable case, supports load/save, but no upserts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, I don't see a problem why Mutable index should not be loaded. But in our scenarios it is, indeed, never used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that load&save is invalid here. My question is about design. Can you redesign InvertedIndex and remove save and load functions from trait to avoid unnecessary panic here?
I propose to do the same as vector storage: constructor is a separate function wich know stoarge/index type, ref:
https://github.com/qdrant/qdrant/blob/dev/lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs#L41

@xzfc xzfc merged commit ccf7f1d into dev May 16, 2024
18 checks passed
@xzfc xzfc deleted the split-posting-list-2 branch May 16, 2024 06:47
generall pushed a commit that referenced this pull request May 26, 2024
* Move StorageVersion from segment crate to common/io

* Refine StorageVersion API

* Move methods from SparseVectorDataConfig to enum SparseIndexType

* Introduce InvertedIndexImmutableRam

* Add migrate

* Don't migrate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants