Skip to content

dbt incremental _configuration_changes produces false positive results #955

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vvsivaprasadreddy opened this issue Mar 5, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@vvsivaprasadreddy
Copy link

Describe the bug

the diff checker for existing and new config has a bug in databricks incremental materialization which is not evaluating correctly, so it detects config changes on every run and trying to apply tags and tblproperties,
link to code

Steps To Reproduce

Define tags and table properties for a incremental model config. Upon running the model multiple times it is trying to apply tags and table properties even though there is no change in the config.

Expected behavior

The get_diff function in TagsConfig class should check only difference in tags and tblproperties and if only there is a difference then it should apply them.

Screenshots and log output

Sample log of config which shows the changes are present even though the existing config and model config has same tags and tblproperties

Existing config:

config={'tags': TagsConfig(set_tags={'reporting': '', 'dbt': ''}, unset_tags=[]), 'tblproperties': TblPropertiesConfig(tblproperties={'clusteringColumns': '[["employee_id"]]', 'delta.checkpoint.writeStatsAsJson': 'false', 'delta.checkpoint.writeStatsAsStruct': 'true', 'delta.enableDeletionVectors': 'true', 'delta.enableRowTracking': 'true', 'delta.feature.clustering': 'supported', 'delta.feature.deletionVectors': 'supported', 'delta.feature.domainMetadata': 'supported', 'delta.feature.invariants': 'supported', 'delta.feature.rowTracking': 'supported', 'delta.feature.timestampNtz': 'supported', 'delta.minReaderVersion': '3', 'delta.minWriterVersion': '7', 'delta.rowTracking.materializedRowCommitVersionColumnName': '_row-commit-version-col-53fe29f4-05f5-4ae2-8bb0-f07437b6dff1', 'delta.rowTracking.materializedRowIdColumnName': '_row-id-col-eba33e5b-0bc2-4f98-ba1c-dca95cce5c6d'}, pipeline_id=None, ignore_list=['pipelines.pipelineId', 'delta.enableChangeDataFeed', 'delta.minReaderVersion', 'delta.minWriterVersion', 'pipeline_internal.catalogType', 'pipelines.metastore.tableName', 'pipeline_internal.enzymeMode', 'clusteringColumns', 'delta.enableRowTracking', 'delta.feature.appendOnly', 'delta.feature.changeDataFeed', 'delta.feature.checkConstraints', 'delta.feature.domainMetadata', 'delta.feature.generatedColumns', 'delta.feature.invariants', 'delta.feature.rowTracking', 'delta.rowTracking.materializedRowCommitVersionColumnName', 'delta.rowTracking.materializedRowIdColumnName', 'spark.internal.pipelines.top_level_entry.user_specified_name'])}

Model config:

config={'tags': TagsConfig(set_tags={'reporting': '', 'dbt': ''}, unset_tags=[]), 'tblproperties': TblPropertiesConfig(tblproperties={'delta.feature.timestampNtz': 'supported'}, pipeline_id=None, ignore_list=['pipelines.pipelineId', 'delta.enableChangeDataFeed', 'delta.minReaderVersion', 'delta.minWriterVersion', 'pipeline_internal.catalogType', 'pipelines.metastore.tableName', 'pipeline_internal.enzymeMode', 'clusteringColumns', 'delta.enableRowTracking', 'delta.feature.appendOnly', 'delta.feature.changeDataFeed', 'delta.feature.checkConstraints', 'delta.feature.domainMetadata', 'delta.feature.generatedColumns', 'delta.feature.invariants', 'delta.feature.rowTracking', 'delta.rowTracking.materializedRowCommitVersionColumnName', 'delta.rowTracking.materializedRowIdColumnName', 'spark.internal.pipelines.top_level_entry.user_specified_name'])}

Model config changes:

changes={'tags': TagsConfig(set_tags={'reporting': '', 'dbt': ''}, unset_tags=[]), 'tblproperties': TblPropertiesConfig(tblproperties={'delta.feature.timestampNtz': 'supported'}, pipeline_id=None, ignore_list=['pipelines.pipelineId', 'delta.enableChangeDataFeed', 'delta.minReaderVersion', 'delta.minWriterVersion', 'pipeline_internal.catalogType', 'pipelines.metastore.tableName', 'pipeline_internal.enzymeMode', 'clusteringColumns', 'delta.enableRowTracking', 'delta.feature.appendOnly', 'delta.feature.changeDataFeed', 'delta.feature.checkConstraints', 'delta.feature.domainMetadata', 'delta.feature.generatedColumns', 'delta.feature.invariants', 'delta.feature.rowTracking', 'delta.rowTracking.materializedRowCommitVersionColumnName', 'delta.rowTracking.materializedRowIdColumnName', 'spark.internal.pipelines.top_level_entry.user_specified_name'])} requires_full_refresh=False

System information

Core:

  • installed: 1.9.2
  • latest: 1.9.2 - Up to date!

Plugins:

  • spark: 1.9.1 - Up to date!
  • databricks: 1.9.4 - Update available!

Windows 11 (Docker dev container in VS code)

Python 3.10.16

@vvsivaprasadreddy vvsivaprasadreddy added the bug Something isn't working label Mar 5, 2025
@benc-db
Copy link
Collaborator

benc-db commented Mar 5, 2025

Thanks for reporting

@benc-db
Copy link
Collaborator

benc-db commented Mar 13, 2025

Fixes coming in 1.10.0. I'll put together an alpha next week. Let me know if you're willing to verify against your use case next week.

@vvsivaprasadreddy
Copy link
Author

Hi @benc-db , Thank you for the quick turnaround. I would love to test the alpha version.

@benc-db
Copy link
Collaborator

benc-db commented Mar 17, 2025

#963 Discussion page for the alpha. I found multiple bugs with the incremental comparison logic while working through this, so please let me know what else might still not be behaving.

@benc-db
Copy link
Collaborator

benc-db commented Apr 29, 2025

Hi, even more fixes around this area went into 1.10.1, please try it out when you get a chance. If there are more incremental change bugs, I would love to get them addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants