Closed
Description
Describe the bug
Using datafusion-cli to read csv files generated by tpchgen-cli sometimes fails.
This issue was originally posted at #66 (comment)
To Reproduce
Steps to reproduce the behavior:
- Git clone the latest of the main branch (rev: cb325ad)
- Build with cargo:
cargo build --release
- Generate csv files in a dedicated directory:
./target/release/tpchgen-cli -f csv -o gen
- Install the latest release of
datafusion-cli
:cargo install datafusion-cli
- Ensure
datafusion-cli
is of version46.0.1
- Start datafusion-cli by running
datafusion-cli
in your terminal. - Run sql
select version()
which should print a message containing46.0.1
.
- Use
datafusion-cli
to read thepart.csv
file by running sqlselect * from './gen/part.csv'
. Note be sure to not add extra limit clause. - The datafusion-cli should report an error like:
Arrow error: Parser error: Error while parsing value p_partkey for column 0 at line 24597
Expected behavior
All csv files generated by tpchgen-cli should be readable by datafusion-cli.
Screenshots

Environment (please complete the following information):
- OS: Darwin Mac 24.0.0 Darwin Kernel Version 24.0.0: Mon Aug 12 20:52:18 PDT 2024; root:xnu-11215.1.10~2/RELEASE_ARM64_T8122 arm64
- Compiler Version: 1.85.1