Skip to content

Write Dataset with file_visitor core dump #46800

Open
@wingerted

Description

@wingerted

Describe the bug, including details regarding any error messages, version, and platform.

Describe

Call pyarrow.dataset write_dataset with file_visitor will core dump. If not pass file_visitor, write_dataset running success.

Reproduce Code

import pyarrow as pa
import pyarrow.dataset as ds
import uuid, pathlib, time, os
import json, pyarrow.parquet as pq



table_uri = pathlib.Path("data/my_ds")
data_dir   = table_uri
data_dir.mkdir(parents=True, exist_ok=True)


tbl = pa.table({
    "id":    pa.array([1, 2, 3], pa.int64()),
    "value": pa.array([10, 20, 30], pa.int64()),
    "ds":    pa.array(["2025-06-12"]*3, pa.string())
})


f_list = []
def f_visit(f):
    f_list.append(f.path)

ds.write_dataset(
    tbl,
    base_dir=data_dir,
    format="parquet",
    basename_template=str(uuid.uuid4()) + "-{i}.parquet",
    existing_data_behavior="delete_matching",
    file_visitor=f_visit
)

Enviroment

  • Python: python 3.12
  • Arrow Version: 19, 20

Component(s)

C++, Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions