-
Notifications
You must be signed in to change notification settings - Fork 137
Loss of UC column-level lineage due to use of 'temporary views' in 'ephemeral' materialization #979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We don't have an ephemeral materialization in dbt-databricks, and every time I've used an ephemeral materialization in my own pipelines, it has been inserted as a CTE into the models that reference it. Can you provide the evidence that leads you to think otherwise? |
Having said this, it is definitely a concern of the new code in 1.10.0, where we use temporary views in several places. |
Is this a new issue, or one that has existed for a while? I'm thinking the issue is not with ephemeral, but perhaps with table materialization itself? |
Looking here, there are many things that could cause column lineage to not be captured, among them temporary views. But I see conflicting evidence in my own pipeline, where I do have column lineage even for cases where a temporary view was used. One thing that stood out from the limitations https://docs.databricks.com/aws/en/data-governance/unity-catalog/data-lineage#limitations "Complete column-level lineage is not captured by default for MERGE operations. You can turn on lineage capture for MERGE operations by setting the Spark property spark.databricks.dataLineage.mergeIntoV2Enabled to true. Enabling this flag can slow down query performance, particularly in workloads that involve very wide tables." |
Spoke with the UC team. Found out that actually they have recently fixed an issue where complex CTEs were causing lineage to be lost, which could very well be what is happening in your case! Unfortunately such fixes some times take a little while to trickle out. I will keep this ticket open, and I appreciate any further information you can provide, but my hope is that Databricks fixes this at the source. |
Describe the bug
Ephemeral materializations create temporary views. However, when there is a temporary view (ephemeral) between A and B, the column level lineage is lost (not recorded in UC's system tables or API).
While this might be an issue on Databricks side, this is easily circumvented at the adapter side by not using temporary views, but straight up adding the code as CTE in model definition at compile time.
Steps To Reproduce
Give a source A, and model B. Make a dbt project with A --> X (materialization = ephemeral) --> B.
Databricks does not capture column level lineage.
Make a transformation A --> B (with the logic of X in B's model definition), and Databricks does capture the column-level lineage.
Expected behavior
Ephemeral materializations should be (optionally?) put in the definition of the model that uses it, instead of creating temporary views.
System information
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:Python 3.10.12
The text was updated successfully, but these errors were encountered: