Closed
Description
Is your idea related to a problem? Please describe.
The ray.data.*
module has a read_parquet_bulk
which is optimised to read a large number of small files. It expects a list of paths instead of a directory.
https://docs.ray.io/en/latest/data/api/input_output.html#parquet
The issue with it however is that it does not use the ParquetDataset
abstraction to infer metadata.
P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.