Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28266: Iceberg: select count(*) from data_files metadata tables … #5253

Merged
merged 1 commit into from
May 22, 2024

Conversation

difin
Copy link
Contributor

@difin difin commented May 16, 2024

…gives wrong result

What changes were proposed in this pull request?

Modified Iceberg method "canComputeQueryUsingStats" to return false for queries over metadata tables to make Hive execute query over metadata table instead of getting the result from statistics.

Why are the changes needed?

Presently, when running a SELECT COUNT(*) query over an Iceberg table X.data_files where X is a data table, the result returns number of records in X rather than in X.data_files.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

New query test added

Copy link
Contributor

@zhangbutao zhangbutao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM. Pending tests.

I also think there are some other similar places which query iceberg metadata tables but using the data table's statistics wrongly. We can fix them incrementally. Like #5215 which i am doing.

@@ -1512,7 +1512,8 @@ private String collectColumnAndReplaceDummyValues(ExprNodeDesc node, String foun
private void fallbackToNonVectorizedModeBasedOnProperties(Properties tableProps) {
Schema tableSchema = SchemaParser.fromJson(tableProps.getProperty(InputFormatConfig.TABLE_SCHEMA));
if (FileFormat.AVRO.name().equalsIgnoreCase(tableProps.getProperty(TableProperties.DEFAULT_FILE_FORMAT)) ||
(tableProps.containsKey("metaTable") && isValidMetadataTable(tableProps.getProperty("metaTable"))) ||
(tableProps.containsKey(IcebergAcidUtil.META_TABLE_PROPERTY) &&
isValidMetadataTable(tableProps.getProperty(IcebergAcidUtil.META_TABLE_PROPERTY))) ||
Copy link
Member

@deniskuzZ deniskuzZ May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we simplify to isValidMetadataTable(tableProps.getProperty(IcebergAcidUtil.META_TABLE_PROPERTY)) and check for null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

sonarcloud bot commented May 21, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@deniskuzZ deniskuzZ merged commit 18c434f into apache:master May 22, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants