Skip to content

How to materialise injected partition projections? #454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
igoichuk opened this issue Aug 24, 2023 · 3 comments
Open

How to materialise injected partition projections? #454

igoichuk opened this issue Aug 24, 2023 · 3 comments
Labels
pkg:dbt-athena Issue affects dbt-athena type:question

Comments

@igoichuk
Copy link

Hi, I am trying to use dbt over hive Athena tables with various types of partition projections. Particularly for injected partitions. The query requires at least one exact value for that partition in WHERE clause. I understand there could be some incremental materialisation that splits available values into chucks and generates multiple insert statements, similar to how 100 partition problem was solved. Does anything like that exist? Is there any other way?

@nicor88
Copy link
Contributor

nicor88 commented Sep 6, 2023

I don't fully understand the last part of your issue.

In order to us partition projection we need to setup table properties e.g.

TABLEPROPERTIES (
    'projection.enabled' = true
    'projection.date_created.type' = 'date',
    'projection.date_created.format' = 'yyyy/MM/dd',
    'projection.date_created.interval' = '1',
    'projection.date_created.interval.unit' = 'DAYS',
    'projection.date_created.range' = '2021/01/01,NOW',
    'storage.location.template' = 's3://mybucket/data/${date_created}/'

this means that we could extend the adatper to setup those properties - e.g. as we do for iceberg.
Doing so you can add those properties in your model config.

I think that we support something like that -> https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/table/create_table_as.sql#L13C48-L13C64 at least for full loads.

I believe that we need to add something for incremental loads -as it seems totally missing https://github.com/dbt-athena/dbt-athena/blob/main/dbt/include/athena/macros/materializations/models/incremental/incremental.sql

@igoichuk
Copy link
Author

igoichuk commented Sep 8, 2023

@nicor88, the problem is with the source table having an injected partition, not with the target table. According to documentation each injected column must have exact values specified in the WHERE clause, so I cannot omit it and I cannot specify the whole range of values in it (too long). Hence I cannot materialize from such a table.

The range of values for an injected column is in S3 prefixes for that column. Similarly to a "100 partitions fix" or "insert_by_period" where materialisation happens in batches, it can be done for partition projection where S3 prefixes are enumerated and injected into a query replacing some INJECTED_FILTER placeholder with another batch of "column" in ('value1',... 'valueN'). Potentially I could narrow down a range of values to materialize similar to start/end dates in insert_by_period.

Is there any better way you could think of to materialize from injected column without implementing the above or giving up on injected partitions?

@nicor88
Copy link
Contributor

nicor88 commented Sep 8, 2023

AH! I think that I got it now, but let me correct if I'm wrong, you are not able to query entirely the source table because you use partition projections, therefore you need to have specific where statements.

Did you considered to use a custom macro that first spit out the partition values, then it assemble a query based on those values?

@mikealfare mikealfare added the pkg:dbt-athena Issue affects dbt-athena label Jan 10, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-athena Jan 13, 2025
mikealfare pushed a commit that referenced this issue Jan 13, 2025
* Temporary dev-requirements

* Changelog entry

* Implementations and tests for array macros

* Remove `cast_array_to_string` macro

* Restore original dev-requirements.txt
mikealfare pushed a commit that referenced this issue Jan 20, 2025
* Rename flag and get autocommit on

* Add changie

* Add test for issue 451

* Add test for issue #454

* Fix spelling.

* Revert names

* update tests

* update the comment to use pep for rationale

---------

Co-authored-by: Mila Page <[email protected]>
mikealfare pushed a commit that referenced this issue Jan 24, 2025
* convert custom_schema tests to functional

* Replace bigserial with bigint

* nit newline

* import ordering

* Pr comments cleanup

* scope of setUp to function

* Simplify seeds addition

* Nits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:dbt-athena Issue affects dbt-athena type:question
Projects
None yet
Development

No branches or pull requests

4 participants