Skip to content

Consider using ray.data.read_parquet_bulk when possible #2023

Closed
@jaidisido

Description

@jaidisido

Is your idea related to a problem? Please describe.
The ray.data.* module has a read_parquet_bulk which is optimised to read a large number of small files. It expects a list of paths instead of a directory.
https://docs.ray.io/en/latest/data/api/input_output.html#parquet

The issue with it however is that it does not use the ParquetDataset abstraction to infer metadata.

P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions