Skip to content

fix: Invalid merge when unique_key == merge_update_columns #406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

artem-garmash
Copy link
Contributor

@artem-garmash artem-garmash commented Sep 8, 2023

Description

Incremental model may have unique_key same as merge_update_columns. For example, to skip updates for matched rows and do only inserts and/or deletes. In such cases, the generate merge sql statement is invalid:

merge into alerts as target using alerts__dbt_tmp as src
        on (target.alert_id = src.alert_id)
      when matched
        then update set
      when not matched
        then insert  (...)
          values (...)

Another case when the update column list can be empty and generate similar invalid sql is if merge_exclude_columns removes all columns.

The fix checks the final update list and uses it only if it's not empty, avoid generating incomplete when matched then update part of the merge statement.

The PR adds 2 tests to cover both cases. As I don't see any ready test to cover merge_update_columns usage, TestIncrementalIcebergMergeNoUpdates just test the original case. TestIcebergMergeExcludeAllColumns tests using merge_exclude_columns to get empty update column list. And original TestIcebergMergeExcludeColumns from dbt-core is added as well as it's used by TestIcebergMergeExcludeAllColumns.

Let me know if the tests should be arranged/named in some other way.

The issue was discovered when trying to port elementary CLI to athena, see https://github.com/artem-garmash/elementary/pull/1/files#r1227587134 for the context.

See full model source: https://github.com/elementary-data/elementary/blob/master/elementary/monitor/dbt_project/models/alerts/alerts.sql

Models used to test

{{ config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    merge_update_columns = ['id'],
    table_type = 'iceberg',
) }}

{% if not is_incremental() %}

-- data for first invocation of model

select 1 as id, 'hello' as msg, 'blue' as color
union all
select 2 as id, 'goodbye' as msg, 'red' as color

{% else %}

-- data for subsequent incremental update

select 1 as id, 'hey' as msg, 'blue' as color
union all
select 2 as id, 'yo' as msg, 'green' as color
union all
select 3 as id, 'anyway' as msg, 'purple' as color

{% endif %}
{{ config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy='merge',
    merge_exclude_columns=['msg', 'color']
) }}

{% if not is_incremental() %}

-- data for first invocation of model

select 1 as id, 'hello' as msg, 'blue' as color
union all
select 2 as id, 'goodbye' as msg, 'red' as color

{% else %}

-- data for subsequent incremental update

select 1 as id, 'hey' as msg, 'blue' as color
union all
select 2 as id, 'yo' as msg, 'green' as color
union all
select 3 as id, 'anyway' as msg, 'purple' as color

{% endif %}

Checklist

  • You followed contributing section
  • You kept your Pull Request small and focused on a single feature or bug fix.
  • You added unit testing when necessary
  • You added functional testing when necessary

@nicor88
Copy link
Contributor

nicor88 commented Sep 8, 2023

As follow up of this: #406 (comment)
please add models that you use to test, and eventually consider to write integrations tests for the new supported cases.

@artem-garmash
Copy link
Contributor Author

As follow up of this: #406 (comment) please add models that you use to test, and eventually consider to write integrations tests for the new supported cases.

Sure, added a couple of tests.

@svdimchenko
Copy link
Contributor

@artem-garmash it's great to know you're working on athena support for elementary package 💪 🚀 . Is this the only 1 feature which blocks to to publish the solution to main elementary branch ?

@artem-garmash
Copy link
Contributor Author

@svdimchenko thx! This was the only issue discovered. As those alert tables work with other connectors in Elementary CLI, I think it should be fixed here. Once this fixed, the elementary cli branch can be finalized. I'm testing now the new branches with the port with proper deployment and hopefully can open new PRs next week.

@nicor88
Copy link
Contributor

nicor88 commented Sep 14, 2023

@artem-garmash great work - for me we can merge, thanks for the detailed explanation.

@nicor88 nicor88 merged commit fc00e80 into dbt-labs:main Sep 14, 2023
@artem-garmash artem-garmash deleted the handle-merge-wo-update branch November 7, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants