Skip to content

[BUG] datafusion-cli may fail to read csv files generated by tpchgen-cli #73

Closed
@niebayes

Description

@niebayes

Describe the bug

Using datafusion-cli to read csv files generated by tpchgen-cli sometimes fails.
This issue was originally posted at #66 (comment)

To Reproduce
Steps to reproduce the behavior:

  1. Git clone the latest of the main branch (rev: cb325ad)
  2. Build with cargo: cargo build --release
  3. Generate csv files in a dedicated directory: ./target/release/tpchgen-cli -f csv -o gen
  4. Install the latest release of datafusion-cli: cargo install datafusion-cli
  5. Ensure datafusion-cli is of version 46.0.1
  • Start datafusion-cli by running datafusion-cli in your terminal.
  • Run sql select version() which should print a message containing 46.0.1.
  1. Use datafusion-cli to read the part.csv file by running sql select * from './gen/part.csv'. Note be sure to not add extra limit clause.
  2. The datafusion-cli should report an error like: Arrow error: Parser error: Error while parsing value p_partkey for column 0 at line 24597

Expected behavior
All csv files generated by tpchgen-cli should be readable by datafusion-cli.

Screenshots

Image

Environment (please complete the following information):

  • OS: Darwin Mac 24.0.0 Darwin Kernel Version 24.0.0: Mon Aug 12 20:52:18 PDT 2024; root:xnu-11215.1.10~2/RELEASE_ARM64_T8122 arm64
  • Compiler Version: 1.85.1

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions