Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Configurations](multi-catalog) Add enable_parquet_filter_by_min_max and enable_orc_filter_by_min_max Session variables. #35012

Merged

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented May 17, 2024

Proposed changes

[Configurations] (multi-catalog) Add enable_parquet_filter_by_min_max and enable_orc_filter_by_min_max Session variables for workaround.
In case that when the column min max statistics in file are incorrect, we can set these variables to skip them

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -169,6 +170,7 @@ OrcReader::OrcReader(const TFileScanRangeParams& params, const TFileRangeDesc& r
_file_system(nullptr),
_io_ctx(io_ctx),
_enable_lazy_mat(enable_lazy_mat),
_enable_filter_by_min_max(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: member initializer for '_enable_filter_by_min_max' is redundant [modernize-use-default-member-init]

Suggested change
_enable_filter_by_min_max(true),
,

@kaka11chen kaka11chen force-pushed the add_merge_filter_by_min_max_session_vars branch from e792124 to 99324c6 Compare May 17, 2024 08:54
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the add_merge_filter_by_min_max_session_vars branch from 99324c6 to 6a3bb5b Compare May 17, 2024 09:10
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests

@kaka11chen kaka11chen force-pushed the add_merge_filter_by_min_max_session_vars branch 2 times, most recently from ac52f65 to 2d0ac02 Compare May 17, 2024 10:25
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -149,6 +149,8 @@ OrcReader::OrcReader(RuntimeProfile* profile, RuntimeState* state,
_ctz(ctz),
_io_ctx(io_ctx),
_enable_lazy_mat(enable_lazy_mat),
_enable_filter_by_min_max(
state == nullptr ? true : state->query_options().enable_orc_filter_by_min_max),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: member initializer for '_dict_cols_has_converted' is redundant [modernize-use-default-member-init]

Suggested change
state == nullptr ? true : state->query_options().enable_orc_filter_by_min_max),
,

@@ -169,6 +171,7 @@
_file_system(nullptr),
_io_ctx(io_ctx),
_enable_lazy_mat(enable_lazy_mat),
_enable_filter_by_min_max(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: member initializer for '_dict_cols_has_converted' is redundant [modernize-use-default-member-init]

Suggested change
_enable_filter_by_min_max(true),
{

…` and `enable_orc_filter_by_min_max` Session variables.
@kaka11chen kaka11chen force-pushed the add_merge_filter_by_min_max_session_vars branch from 2d0ac02 to 5aa608d Compare May 17, 2024 12:04
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.71% (9008/25226)
Line Coverage: 27.37% (74509/272236)
Region Coverage: 26.61% (38532/144781)
Branch Coverage: 23.44% (19665/83886)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5aa608d2f450040043a4d093ab45a2434845049b_5aa608d2f450040043a4d093ab45a2434845049b/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman added usercase Important user case type label dev/2.1.x dev/3.0.x labels May 17, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 17, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 6ac8c28 into apache:master May 21, 2024
25 of 28 checks passed
morningman pushed a commit to morningman/doris that referenced this pull request May 21, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 21, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 21, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
morningman pushed a commit that referenced this pull request May 22, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (#35012) (#35164)

backport #35012
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 23, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 23, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 23, 2024
kaka11chen added a commit to kaka11chen/doris that referenced this pull request May 23, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
M1saka2003 pushed a commit to M1saka2003/doris that referenced this pull request May 24, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (apache#35012)
dataroaring pushed a commit that referenced this pull request May 26, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (#35012)
morningman pushed a commit that referenced this pull request May 27, 2024
dataroaring pushed a commit that referenced this pull request May 27, 2024
yiguolei pushed a commit that referenced this pull request May 27, 2024
seawinde pushed a commit to seawinde/doris that referenced this pull request May 27, 2024
morningman pushed a commit that referenced this pull request May 28, 2024
…` and `enable_orc_filter_by_min_max` Session variables. (#35290)

backport #35012 #35320
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.11-merged dev/2.1.4-merged dev/3.0.x meta-change reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants