Consider using `ray.data.read_parquet_bulk` when possible

**Is your idea related to a problem? Please describe.**
The `ray.data.*` module has a `read_parquet_bulk` which is optimised to read a large number of small files. It expects a list of paths instead of a directory.
https://docs.ray.io/en/latest/data/api/input_output.html#parquet

The issue with it however is that it does not use the `ParquetDataset` abstraction to infer metadata.

*P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider using `ray.data.read_parquet_bulk` when possible #2023

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider using ray.data.read_parquet_bulk when possible #2023

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider using `ray.data.read_parquet_bulk` when possible #2023