Description
Describe the bug
When using the s3.read_parquet_table function, the partition filter will work for the first n-1 partitions, but the final nth partition will not be present in the dictionary consumed by the filtering function.
I've investigated the issue, and I believe the culprit is line 73 in _extract_partitions_from_path
in s3._read:
path_wo_filename: str = path.rpartition("/")[0] + "/"
If the path
value is not a filename, but rather a directory that does not end in "/", this will remove the nth partition from the path.
Environment
awswrangler = 2.13.0
To Reproduce
This happens when the partition locations returned by AWS Glue do not end in "/".
We are writing these tables using Apache Spark, which might be causing this.
To Solve
Based on the documentation for the get_partitions
function in catalog._get, I think the cleanest fix for this would be to ensure that the partition locations retrieved from AWS Glue effectively end in "/" and add the missing slash if not.
I will create a PR for this fix.