partition_filter on s3.read_parquet_table ignores last partition

**Describe the bug**

When using the s3.read_parquet_table function, the partition filter will work for the first n-1 partitions, but the final nth partition will not be present in the dictionary consumed by the filtering function.

I've investigated the issue, and I believe the culprit is line 73 in `_extract_partitions_from_path` in s3._read:
`path_wo_filename: str = path.rpartition("/")[0] + "/"`

If the `path` value is not a filename, but rather a directory that does not end in "/", this will remove the nth partition from the path.

**Environment**

awswrangler = 2.13.0

**To Reproduce**

This happens when the partition locations returned by AWS Glue do not end in "/".
We are writing these tables using Apache Spark, which might be causing this.

**To Solve**

Based on the documentation for the `get_partitions` function in catalog._get, I think the cleanest fix for this would be to ensure that the partition locations retrieved from AWS Glue effectively end in "/" and add the missing slash if not.
I will create a PR for this fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

partition_filter on s3.read_parquet_table ignores last partition #1094

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

partition_filter on s3.read_parquet_table ignores last partition #1094

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions