Unable to use s3.read_parquet_table() to read from Glue tables whose name is the prefix of another table.

**Describe the bug**
When using the function s3.read_parquet_table(), Data Wrangler will first lookup the table URI and use this as the path argument for s3.read_parquet(). Relevant code snippet:
```
res: Dict[str, Any] = client_glue.get_table(**args)
try:
    path: str = res["Table"]["StorageDescriptor"]["Location"]
```

The issue with this piece of code is that the storage descriptor returned by Glue does not necessarily contain a slash ("/") as suffix, meaning that when it is passed to s3.read_parquet(), it will expand everything including paths matching the table name. The issue becomes apparent when you have tables matching this format in Glue:
```
database.prefix
database.prefix_suffix
```
Attempting to read from `database.prefix` will also match files under `database.prefix_suffix`, resulting in an error of the form: `InvalidArgumentValue: Object s3://<bucket>/<database>/<prefix_suffix>/<file>.parquet is not under the root path (s3://<bucket>/<database>/<prefix>/).`

A simple fix could simply be to add the "/" suffix to the path when it's not present:
```
path: str = res["Table"]["StorageDescriptor"]["Location"]
path: str = path if path.endswith("/") else f"{path}/"
```

**To Reproduce**
Tested with Python 3.7.9 and Wranger 2.6.0, installed via pip.

Steps to reproduce the behavior:
* Create a Glue database "test_database" with two tables "test_table" and "test_table_extra". 
* Write some data to these tables so they are populated.
* Use Data Wrangler to read data from "test_table", e.g.
```
import awswrangler as wr
df = wr.s3.read_parquet_table(database='test_database', table='test_table')
```

You will get an error indicating files belonging to `test_table_extra` are not under the root path of `test_table`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to use s3.read_parquet_table() to read from Glue tables whose name is the prefix of another table. #638

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to use s3.read_parquet_table() to read from Glue tables whose name is the prefix of another table. #638

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions