Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Appender] Add AppendDefault #11905

Merged
merged 30 commits into from
Jun 7, 2024
Merged

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented May 2, 2024

This feature has been requested many times, so I figured it was time to add it.

We add AppendDefault to the Appender class (not BaseAppender).
This method can be used in place of Append(Value val)

The behavior of AppendDefault is the same as calling INSERT INTO <name> VALUES (DEFAULT) through SQL

src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Great idea. Some comments:

test/appender/test_appender.cpp Outdated Show resolved Hide resolved
src/include/duckdb/main/appender.hpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
@Tishj Tishj marked this pull request as ready for review May 3, 2024 11:08
Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the PR! These changes come in handy!

The main functionality I am missing in this PR is writing a default value directly to a vector in a data chunk. Any performance-focused appender implementation most likely uses duckdb_append_data_chunk instead of appending value-by-value.

From my understanding, in order to write default values to a vector in a data chunk, we need the context of the appender, as we need to know the context of the table. I.e., we cannot add this functionality to the data chunk API. So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

src/main/appender.cpp Outdated Show resolved Hide resolved
test/appender/test_appender.cpp Show resolved Hide resolved
@duckdb-draftbot duckdb-draftbot marked this pull request as draft May 7, 2024 08:23
@Tishj
Copy link
Contributor Author

Tishj commented May 7, 2024

So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

That makes sense, I think to be performant it makes more sense to take a list of integers instead of a single row index.
Since internally we use an ExpressionExecutor, running this expression on multiple rows at once is likely more performant.

@Tishj
Copy link
Contributor Author

Tishj commented May 7, 2024

So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

That makes sense, I think to be performant it makes more sense to take a list of integers instead of a single row index. Since internally we use an ExpressionExecutor, running this expression on multiple rows at once is likely more performant.

@taniabogatsch I've made a commit that starts this, taking a SelectionVector and a count for the amount of defaults that need to be appended (taking the indices from the SelectionVector)

I've had to make some adjustments to the ExpressionExecutor that make me think this is not it's intended purpose (even though it does already take a const SelectionVector *sel)
Please check out this commit first PoC for AppendDefaultToVector

src/main/appender.cpp Outdated Show resolved Hide resolved
@Giorgi
Copy link
Contributor

Giorgi commented May 7, 2024

Any performance-focused appender implementation most likely uses duckdb_append_data_chunk instead of appending value-by-value.

From my understanding, in order to write default values to a vector in a data chunk, we need the context of the appender, as we need to know the context of the table. I.e., we cannot add this functionality to the data chunk API. So, maybe it is sensible to extend this PR with a function Appender::AppendDefaultToVector, which takes a vector and a row index and writes the default value to the expected position... 🤔

Not only performance-focused, duckdb_append_data_chunk is needed to append more "complex" data types (uuid, decimal, enum, list, etc) so we definitely need a way to append a default value to a vector.

@Tishj
Copy link
Contributor Author

Tishj commented May 7, 2024

I didn't feel like the appender was the right place to add methods that query table information.
As such I've created a duckdb_table_description object, which wraps a TableDescription object, which is also present inside the Appender.

This can then be queried for information; duckdb_column_has_default(table, 0)
I imagine this can be extended in the future to include duckdb_column_is_generated, duckdb_column_get_type, duckdb_column_get_name, etc..

Copy link
Contributor

@taniabogatsch taniabogatsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tishj, the changes and improvements look nice! 😄 I've added mostly nits and also some comments.

src/include/duckdb.h Outdated Show resolved Hide resolved
src/execution/expression_executor/execute_constant.cpp Outdated Show resolved Hide resolved
src/execution/expression_executor.cpp Outdated Show resolved Hide resolved
src/include/duckdb/execution/expression_executor.hpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/include/duckdb/main/appender.hpp Outdated Show resolved Hide resolved
test/api/capi/test_capi_appender.cpp Outdated Show resolved Hide resolved
test/api/capi/test_capi_appender.cpp Show resolved Hide resolved
test/api/capi/test_capi_appender.cpp Outdated Show resolved Hide resolved
test/appender/test_appender.cpp Outdated Show resolved Hide resolved
@Giorgi
Copy link
Contributor

Giorgi commented May 13, 2024

@Tishj Will you expose AppendDefaultsToVector in the C API as part of this PR?

src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/capi/table_description-c.cpp Outdated Show resolved Hide resolved
@taniabogatsch
Copy link
Contributor

@Giorgi, we plan to expose a way for the appender to write default values when using append_data_chunk in the C API. We'll expose it in a follow-up PR if it is not included in this PR.

@Giorgi
Copy link
Contributor

Giorgi commented May 13, 2024

@Giorgi, we plan to expose a way for the appender to write default values when using append_data_chunk in the C API. We'll expose it in a follow-up PR if it is not included in this PR.

Great! If we don't specify that we want to use default values for some of the columns, will the DuckDB appender automatically use default values for that column? For example, if I have a table with ID auto-increment, will we still need to tell DuckDB that we want to use default for that column?

@taniabogatsch
Copy link
Contributor

Great! If we don't specify that we want to use default values for some of the columns, will the DuckDB appender automatically use default values for that column? For example, if I have a table with ID auto-increment, will we still need to tell DuckDB that we want to use default for that column?

Currently, yes. In this first iteration, it is necessary to specify that a value is a default value (AppendDefault or by using AppendDefaultsToVector). For example, not writing any value (specific or default) to an auto-increment ID will result in an error.

@Tishj Tishj changed the base branch from main to feature May 16, 2024 08:20
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I think this PR has too much stuff going on in it at this point - and a few of the things are also not correct and need to be reworked. I propose the following changes to make this PR simpler and get it merged, then we can do follow-up work in subsequent PRs.

src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/main/appender.cpp Outdated Show resolved Hide resolved
src/include/duckdb.h Outdated Show resolved Hide resolved
@Tishj Tishj marked this pull request as ready for review May 31, 2024 15:59
@Tishj Tishj requested a review from Mytherin June 1, 2024 17:22
@Mytherin Mytherin merged commit 28063a9 into duckdb:feature Jun 7, 2024
40 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Jun 7, 2024

Thanks!

@Giorgi
Copy link
Contributor

Giorgi commented Jun 7, 2024

Will append default to vector be implemented in a future PR? The current one isn't very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants