Skip to content

S3 select on a prefix returns empty data frame #1783

Closed
@kukushking

Description

@kukushking

Describe the bug

See steps to reproduce

How to Reproduce

wr.s3.select_query(
    sql='SELECT * FROM s3object s WHERE s."payment_type" = \'NA \'',
    path="s3://ursa-labs-taxi-data/2010/02/data.parquet",
    input_serialization="Parquet",
    input_serialization_params={},
    scan_range_chunk_size=16 * 1024 * 1024,
)
Out[36]: 
  vendor_id                 pickup_at  ... tolls_amount  total_amount
0       CMT  2010-02-12T23:32:17.000Z  ...          0.0          17.9
1       CMT  2010-02-13T05:09:37.000Z  ...          0.0           6.3
2       CMT  2010-02-13T03:52:53.000Z  ...          0.0           7.1
3       CMT  2010-02-13T22:46:40.000Z  ...          0.0           8.7
4       CMT  2010-02-05T20:14:01.000Z  ...          0.0           7.1
5       CMT  2010-02-05T09:34:56.000Z  ...          0.0          13.0
6       CMT  2010-02-06T12:00:39.000Z  ...          0.0           3.0
7       CMT  2010-02-07T22:15:53.000Z  ...          0.0          23.5
[8 rows x 18 columns]
FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
wr.s3.select_query(
    sql='SELECT * FROM s3object s WHERE s."payment_type" = \'NA \'',
    path="s3://ursa-labs-taxi-data/2010/*.parquet",
    input_serialization="Parquet",
    input_serialization_params={},
    scan_range_chunk_size=16 * 1024 * 1024,
)
Out[37]: 
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7]

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

OSX

Python version

3.9

AWS SDK for pandas version

3.0.0rc2

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions