Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Add storage_compatibility_version to control for what version the DB has to be serialized. #12110

Merged
merged 40 commits into from
May 21, 2024

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented May 17, 2024

The idea behind the storage compatibility version is that we want to be able to add serialization, without losing the ability for forwards compatibility.

What inspired this functionality (and is introduces by this PR as well) is the ability to serialize CatalogEntry dependencies.
Because we won't rebind entries on load of a database anymore (think indexes, default values, views etc..), we also won't rediscover dependencies, as that is done during binding.

We do not serialize them by default currently.
Using `storage_compatibility_version='v0.10.3' you can opt-in to this, it will be the default in a future release.

storage_compatibility_version

This is a new setting on the DBConfig, it takes a VARCHAR value in the form of: v<major>.<minor>.<patch> or latest
The string internally gets mapped to a serialization version, which is checked at serialization time.

The generate_serialization.py was updated to conditionally serialize items that have been marked with a version field based on this setting.

Tishj added 30 commits April 29, 2024 15:33
…alue has a 'default', and is the last property?
…he future we will remove properties we previously serialized?
@Tishj Tishj requested a review from Mytherin May 17, 2024 19:34
@carlopi
Copy link
Contributor

carlopi commented May 18, 2024

I think this is a cool feature to have, but I worry a bit of the ramifications of this, in particular that using this setting will allow user to lose data / constraint / information and produce a DB file that will not keep track of the fact that this has been produced while being instructed to skip some serialization.

In particular, how can you tell given a db file on your machine whether it's the 'complete' DB version or some limited version of it?

One proposal that could handle this goals could be having this setting be at the attached database level, and with this information serialized to the header of the Database file.
Basically something like:
ATTACH file.db (STORAGE_COMPAT v0.10.2) -> create file.db, write STORAGE_COMPAT to it, and will be used for subsequent writes to file (if the version is aware of storage_compat setting, otherwise we assume it has to be compatible)
ATTACH file.db on an existing file, inherits the STORAGE_COMPAT version stored to the file (defaults to no constraint being there)

This achieve a few things: no impact on the running database (I feel strange that storage_compatibility_version + PRAGMA verify_serializer allows you to change results of computed queries), consistent compatibility for a given db file (even when iterating opening-write-close-open-write-close the setting is owned by the DB), and allows easier migration (basically attach DB 1, attach DB 2 with storage_compat, copy from one to other).

@Mytherin
Copy link
Collaborator

Having the target serialization version attached to a database file makes sense to me, but can likely be added later. (1) we still need a default version in case none is specified, this setting can serve as that default setting, and (2) we might also need this for serialization of e.g. logical plans to e.g. JSON, which does not involve an attached database.

@hannes hannes added this to the v0.10.3 milestone May 21, 2024
@Mytherin Mytherin merged commit a1dc845 into duckdb:main May 21, 2024
39 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request May 21, 2024
Merge pull request duckdb/duckdb#12110 from Tishj/serialized_dependencies
Merge pull request duckdb/duckdb#12153 from carlopi/pyodide_in_nightly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants