Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup joins across different data sources with different param allocators causing invalidations #8268

Open
pauldheinrichs opened this issue May 15, 2024 · 0 comments · May be fixed by #8279

Comments

@pauldheinrichs
Copy link

pauldheinrichs commented May 15, 2024

Describe the bug
When attempting to run the orchestration on a druid datasource to rebuild pre-aggregations. The following error is spit out in the API logs after running.

--- EDIT ---

Seems like this for me may be an underlying issue with something i'm working through. My API node is frequently assuming my druid models are my default driver param object in more than just orchestration. It infact is invalidating my pre-aggregations because it's assuming the default driver. I do not have a cube.js file in my instance.

Slack thread:https://cube-js.slack.com/archives/C04NYBJP7RQ/p1715737464066809


(IF i'm just doing it wrong... let me know 😓 )

{
  "action": "post",
      "selector": {
      "contexts": [{ "securityContext": {} }],
      "timezones": ["UTC"],
        "dataSources": ["druid"]
    }
}
{"message":"Downloading external pre-aggregation via query error","queryKey":
[["CREATE TABLE cube_production_pre_aggregations.video_views_video_rollup_count20240415 AS SELECT\n      \"video_views\".video_id \"video_views__video_id\", DATE_TRUNC('day', CAST(TIME_FORMAT(\"video_views\".__time, 'yyyy-MM-dd HH:mm:ss', 'UTC') AS TIMESTAMP)) \"video_views__time_day\", sum(\"video_views\".views) \"video_views__view_count\"\n    FROM\n      TeamVideoView AS \"video_views\"  WHERE (\"video_views\".__time >= TIME_PARSE($1) AND \"video_views\".__time <= TIME_PARSE($2)) GROUP BY 1, 2",["2024-04-15T00:00:00.000Z","2024-04-21T23:59:59.999Z"],{}],[[{"refresh_key":null}]]],"query":"SELECT\n      \"video_views\".video_id \"video_views__video_id\", DATE_TRUNC('day', CAST(TIME_FORMAT(\"video_views\".__time, 'yyyy-MM-dd HH:mm:ss', 'UTC') AS TIMESTAMP)) \"video_views__time_day\", sum(\"video_views\".views) \"video_views__view_count\"\n    FROM\n      TeamVideoView AS \"video_views\"  WHERE (\"video_views\".__time >= TIME_PARSE($1) AND \"video_views\".__time <= TIME_PARSE($2)) GROUP BY 1, 2","values":["2024-04-15T00:00:00.000Z","2024-04-21T23:59:59.999Z"],"targetTableName":"cube_production_pre_aggregations.video_views_video_rollup_count20240415_lnicrtcl_403ejwb4_1j4a3mu","requestId":"6e558a4f-3de1-4e19-a350-fb41acb687a6","newVersionEntry":{"table_name":"cube_production_pre_aggregations.video_views_video_rollup_count20240415","structure_version":"403ejwb4","content_version":"lnicrtcl","last_updated_at":1715801822362,"naming_version":2},"buildRangeEnd":"2024-04-21T23:59:59.999","error":"Error: org.apache.calcite.runtime.CalciteContextException: From line 4, column 81 to line 4, column 82: Column '$1' not found in any table.__time >= TIME_PARSE($1) AND \"video_views\".__time <= TIME_PARSE($2)) GROUP BY 1, 2","values":["2024-04-15T00:00:00.000Z","2024-04-21T23:59:59.999Z"],"targetTableName":"cube_production_pre_aggregations.video_views_video_rollup_count20240415_lnicrtcl_403ejwb4_1j4a3mu","requestId":"6e558a4f-3de1-4e19-a350-fb41acb687a6","newVersionEntry":{"table_name":"cube_production_pre_aggregations.video_views_video_rollup_count20240415","structure_version":"403ejwb4","content_version":"lnicrtcl","last_updated_at":1715801822362,"naming_version":2},"buildRangeEnd":"2024-04-21T23:59:59.999","error":"Error: org.apache.calcite.runtime.CalciteContextException: From line 4, column 81 to line 4, column 82: Column '$1' not found in any table

Focusing on specifically Error: org.apache.calcite.runtime.CalciteContextException: From line 4, column 81 to line 4, column 82: Column '$1' not found in any table

at /cube/node_modules/@cubejs-backend/druid-driver/src/DruidClient.ts:92:19\n    
at processTicksAndRejections (node:internal/process/task_queues:95:5)\n    
at DruidDriver.downloadQueryResults (/cube/node_modules/@cubejs-backend/druid-driver/src/DruidDriver.ts:140:31)\n    
at PreAggregationLoader.refreshReadOnlyExternalStrategy (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:1154:19)\n    
at Object.query (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryCache.ts:602:26)\n    
at QueryQueue.processQuery (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:842:25)"}

To Reproduce
Steps to reproduce the behavior:

  1. setup druid datasource and another default (postgres / redshift in my case)
CUBEJS_DATASOURCES: "default,druid"
  1. setup pre-aggregation on druid cube with a refresh key as such
     refreshKey: {
        every: `1 hour`,
        incremental: true
      },

  1. hit the orchestration API with the above API call

Expected behavior
It seems that this process isn't utilizing the druid based ? expected variable but rather the postgres driver param in my case (which is redshift / postgres $1 syntax )

IT would appear that this method is being called with a passed in $1 args

image

Version:
Latest version

@pauldheinrichs pauldheinrichs changed the title Orchestration API on multi datasource using improper driver Multi datasource using improper driver for Orchestration / Pre-Aggregation cache checks May 16, 2024
@pauldheinrichs pauldheinrichs changed the title Multi datasource using improper driver for Orchestration / Pre-Aggregation cache checks Druid Driver assuming wrong parameter argument for partitioned pre-aggregation May 17, 2024
@pauldheinrichs pauldheinrichs changed the title Druid Driver assuming wrong parameter argument for partitioned pre-aggregation Druid Driver assuming wrong parameter argument for incremental partitioned pre-aggregation May 17, 2024
@pauldheinrichs pauldheinrichs changed the title Druid Driver assuming wrong parameter argument for incremental partitioned pre-aggregation Rollup joins across different data source with different param allocators causing invalidations May 19, 2024
@pauldheinrichs pauldheinrichs changed the title Rollup joins across different data source with different param allocators causing invalidations Rollup joins across different data sources with different param allocators causing invalidations May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant