Skip to content

partition_filter on s3.read_parquet_table ignores last partition #1094

Closed
@vlieven

Description

@vlieven

Describe the bug

When using the s3.read_parquet_table function, the partition filter will work for the first n-1 partitions, but the final nth partition will not be present in the dictionary consumed by the filtering function.

I've investigated the issue, and I believe the culprit is line 73 in _extract_partitions_from_path in s3._read:
path_wo_filename: str = path.rpartition("/")[0] + "/"

If the path value is not a filename, but rather a directory that does not end in "/", this will remove the nth partition from the path.

Environment

awswrangler = 2.13.0

To Reproduce

This happens when the partition locations returned by AWS Glue do not end in "/".
We are writing these tables using Apache Spark, which might be causing this.

To Solve

Based on the documentation for the get_partitions function in catalog._get, I think the cleanest fix for this would be to ensure that the partition locations retrieved from AWS Glue effectively end in "/" and add the missing slash if not.
I will create a PR for this fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingminor releaseWill be addressed in the next minor releaseready to release

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions