Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INCREMENTAL_BY_UNIQUE_KEY models not taking batch_size into account #2609

Closed
erindru opened this issue May 14, 2024 · 0 comments · Fixed by #2616
Closed

INCREMENTAL_BY_UNIQUE_KEY models not taking batch_size into account #2609

erindru opened this issue May 14, 2024 · 0 comments · Fixed by #2616
Labels
Bug Something isn't working

Comments

@erindru
Copy link
Contributor

erindru commented May 14, 2024

On Trino/Iceberg (and potentially other engines; unconfirmed), the following model doesnt backfill correctly:

MODEL (
    name  datalake_iceberg.stg_premium_day4,
    kind INCREMENTAL_BY_UNIQUE_KEY (
         unique_key id,
         batch_size 1
    ),
    start '2020-01-01',
    cron '@daily'
);

select * from datalake_iceberg.seed_model --note: this is the seed_model from `sqlmesh init`
where event_date between @start_dt and @end_dt

The goal here is to backfill a "daily" model, in batches of 1 day.

However, it does not execute correctly. It triggers a DELETE+INSERT (where the DELETE has WHERE TRUE and clears the whole table) for every interval/batch, even though it should only trigger a DELETE+INSERT for the first batch (to support restatements) and then a MERGE for subsequent batches.

Killing sqlmesh plan before it finishes with ctrl+c and then running sqlmesh plan again, which picks up where it left off, triggers the correct MERGE behaviour.

It appears that the batch_size is not being taken into account correctly when deciding whether to clear the table or merge into the table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
2 participants