Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
Describe
Call pyarrow.dataset write_dataset with file_visitor will core dump. If not pass file_visitor, write_dataset running success.
Reproduce Code
import pyarrow as pa
import pyarrow.dataset as ds
import uuid, pathlib, time, os
import json, pyarrow.parquet as pq
table_uri = pathlib.Path("data/my_ds")
data_dir = table_uri
data_dir.mkdir(parents=True, exist_ok=True)
tbl = pa.table({
"id": pa.array([1, 2, 3], pa.int64()),
"value": pa.array([10, 20, 30], pa.int64()),
"ds": pa.array(["2025-06-12"]*3, pa.string())
})
f_list = []
def f_visit(f):
f_list.append(f.path)
ds.write_dataset(
tbl,
base_dir=data_dir,
format="parquet",
basename_template=str(uuid.uuid4()) + "-{i}.parquet",
existing_data_behavior="delete_matching",
file_visitor=f_visit
)
Enviroment
- Python: python 3.12
- Arrow Version: 19, 20
Component(s)
C++, Python