-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48286] Fix analysis of column with exists default expression - Add user facing error #46594
Conversation
@@ -525,7 +525,7 @@ private[sql] object CatalogV2Util { | |||
} | |||
|
|||
if (isDefaultColumn) { | |||
val e = analyze(f, EXISTS_DEFAULT_COLUMN_METADATA_KEY) | |||
val e = analyze(f, statementType = "Column conversion", EXISTS_DEFAULT_COLUMN_METADATA_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statement type is used only for debugging/logging, the main purpose of it is to be present in exception message being thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether columns
method of Table
is called only on creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At any rate, this looks like the right fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added more changes.
@urosstan-db can you re-trigger the test? |
@cloud-fan Sure, sorry, I did not see this failed |
Can you retry the github action job? Seems flaky |
can we add a test? Besically any default column value that is unfoldable, like |
It is really strange, but I seen this issue on describe table also. |
Also, alter table add column default should trigger this bug? |
For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good, thanks for fixing it! As Wenchen suggests we should add a test as well.
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
Outdated
Show resolved
Hide resolved
@@ -525,7 +525,7 @@ private[sql] object CatalogV2Util { | |||
} | |||
|
|||
if (isDefaultColumn) { | |||
val e = analyze(f, EXISTS_DEFAULT_COLUMN_METADATA_KEY) | |||
val e = analyze(f, statementType = "Column conversion", EXISTS_DEFAULT_COLUMN_METADATA_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At any rate, this looks like the right fix!
f7faa1c
to
5dcdae8
Compare
assert(e.resolved && e.foldable, | ||
"The existence default value must be a simple SQL string that is resolved and foldable, " + | ||
"but got: " + sql) | ||
if (!e.resolved || !e.foldable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is invoked in planning phase from DataSourceV2Strategy, and this was the first assertion on alter table path. I also modified second one (In CatalogV2Util)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey so I looked at this code with Wenchen in detail.
Ultimately, we should not hit his assertion error ever. This happens at querying time, if any expression other than a literal value is stored in the column existence default metadata. This should never happen.
Instead, we should return a proper error message earlier here [1], when we try to run the ALTER TABLE ADD COLUMN command with a default value that we cannot constant-fold.
I can make a more involved fix later; for now, can you just update that location [1] to add a check if the result of the ConstantFolding
is not a literal? It be something like:
ConstantFolding(ReplaceExpressions(analyzed)) match {
case e: Literal => e
case _ =>
throw new AnalysisException(
errorClass = "COLUMN_DEFAULT_EXPRESSION_NOT_SUPPORTED",
...)
}
This error class can have some message to indicate that the requested expression for the column default value is not supported by Spark, and the user should use a different expression there or remove the default entirely and then retry the command again.
[1]
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala
Line 292 in 5baaa61
ConstantFolding(ReplaceExpressions(analyzed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure Daniel, sorry for late resonse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added isFoldable check to parsed expression. ConstantFolding is plan and its expression has alias which is not foldable, so I did check on parsed expression. Btw, do we need to check that it is resolved, since if something is foldable, that means no column reference are there, and I did not add that condition, is that ok?
"but got: " + f.getExistenceDefaultValue().get) | ||
val e = analyze( | ||
f, | ||
statementType = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we leaving stetementType
empty? We don't know the statement here, is that why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this happens when columns of table are get, usually, that is on analysis, etc, but I don't know on which statements that happens. I saw exception on describe table also. Statement type is used only for message formed for exception, so it was not much important and I left in initially to empty string, but let's put something more descriptive
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
Outdated
Show resolved
Hide resolved
d0d4c4e
to
f95d68d
Compare
@@ -284,6 +284,11 @@ object ResolveDefaultColumns extends QueryErrorsBase | |||
throw QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions( | |||
statementType, colName, defaultSQL) | |||
} | |||
|
|||
if (!parsed.foldable) { | |||
throw QueryCompilationErrors.defaultValueIsNotFoldable(colName, defaultSQL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConstantFolding
rule can apply even if the expression is not foldable. For example [1]:
// Replace ScalarSubquery with null if its maxRows is 0
case s: ScalarSubquery if s.plan.maxRows.contains(0) =>
Literal(null, s.dataType)
For this reason, it seems best to apply the ConstantFolding
rule first and then check if the result is a Literal
before returning this error.
[1]
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Line 92 in 34ac7de
// Replace ScalarSubquery with null if its maxRows is 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you are right, thank you.
@@ -1387,6 +1387,12 @@ | |||
], | |||
"sqlState" : "42623" | |||
}, | |||
"COLUMN_DEFAULT_VALUE_IS_NOT_FOLDABLE" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just add a new sub-error-condition under INVALID_DEFAULT_VALUE
?
@@ -298,6 +299,11 @@ object ResolveDefaultColumns extends QueryErrorsBase | |||
val analyzed: Expression = plan.collectFirst { | |||
case Project(Seq(a: Alias), OneRowRelation()) => a.child | |||
}.get | |||
|
|||
// ConstantFolding rule should evaluate expression if it is foldable to literal | |||
if (!analyzed.isInstanceOf[Literal]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be extra safe, shall we use the same condition used in assert and do if (e.resolved && e.foldable)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
...src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala
Outdated
Show resolved
Hide resolved
thanks, merging to master/3.5! |
… Add user facing error FIRST CHANGE Pass correct parameter list to `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` when it is invoked from `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column`. `org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze` method accepts 3 parameter 1) Field to analyze 2) Statement type - String 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT Method `org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column` pass `fieldToAnalyze` and `EXISTS_DEFAULT` as second parameter, so it is not metadata key, instead of that, it is statement type, so different expression is analyzed. Pull requests where original change was introduced #40049 - Initial commit #44876 - Refactor that did not touch the issue #44935 - Another refactor that did not touch the issue SECOND CHANGE Add user facing exception when default value is not foldable or resolved. Otherwise, user would see message "You hit a bug in Spark ...". It is needed to pass correct value to `Column` object Yes, this is a bug fix, existence default value has now proper expression, but before this change, existence default value was actually current default value of column. Unit test No Closes #46594 from urosstan-db/SPARK-48286-Analyze-exists-default-expression-instead-of-current-default-expression. Lead-authored-by: Uros Stankovic <uros.stankovic@databricks.com> Co-authored-by: Uros Stankovic <155642965+urosstan-db@users.noreply.github.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0f21df0) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
FIRST CHANGE
Pass correct parameter list to
org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze
when it is invoked fromorg.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column
.org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze
method accepts 3 parameterMethod
org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column
pass
fieldToAnalyze
andEXISTS_DEFAULT
as second parameter, so it is not metadata key, instead of that, it is statement type, so different expression is analyzed.Pull requests where original change was introduced
#40049 - Initial commit
#44876 - Refactor that did not touch the issue
#44935 - Another refactor that did not touch the issue
SECOND CHANGE
Add user facing exception when default value is not foldable or resolved. Otherwise, user would see message "You hit a bug in Spark ...".
Why are the changes needed?
It is needed to pass correct value to
Column
objectDoes this PR introduce any user-facing change?
Yes, this is a bug fix, existence default value has now proper expression, but before this change, existence default value was actually current default value of column.
How was this patch tested?
Unit test
Was this patch authored or co-authored using generative AI tooling?
No