Skip to content

Do not generate CSV header multiple times #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 29, 2025

Conversation

alamb
Copy link
Collaborator

@alamb alamb commented Mar 29, 2025

Previously the code was generating a CSV header at the start of the chunk rather than once per file

Let's make it once per file

I tested with

cargo run --release --  --format=csv --tables=part -s 1 --output-dir=/tmp/tpchgen-rs
datafusion-cli  -c "select * from '/tmp/tpchgen-rs/part.csv'"

Which works great like this:

DataFusion CLI v46.0.1
+-----------+------------------------------------------+----------------+----------+--------------------------+--------+-------------+---------------+------------------------+
| p_partkey | p_name                                   | p_mfgr         | p_brand  | p_type                   | p_size | p_container | p_retailprice | p_comment              |
+-----------+------------------------------------------+----------------+----------+--------------------------+--------+-------------+---------------+------------------------+
| 175121    | deep lemon frosted orchid firebrick      | Manufacturer#5 | Brand#51 | STANDARD BRUSHED BRASS   | 19     | LG JAR      | 1196.12       | s. regul               |
| 175122    | maroon burlywood beige tan slate         | Manufacturer#5 | Brand#53 | STANDARD BRUSHED TIN     | 14     | WRAP CAN    | 1197.12       | its against            |
| 175123    | medium slate turquoise gainsboro orange  | Manufacturer#3 | Brand#31 | LARGE BRUSHED NICKEL     | 15     | WRAP CAN    | 1198.12       |  pending depths cajo   |
| 175124    | chiffon aquamarine light lavender wheat  | Manufacturer#2 | Brand#24 | STANDARD BURNISHED BRASS | 9      | LG BAG      | 1199.12       | quickl                 |
| 175125    | misty tomato ivory cream purple          | Manufacturer#4 | Brand#44 | MEDIUM ANODIZED BRASS    | 34     | SM CASE     | 1200.12       | riously                |
| 175126    | navajo sky steel cornflower snow         | Manufacturer#5 | Brand#51 | LARGE BURNISHED COPPER   | 8      | MED DRUM    | 1201.12       | wake                   |
| 175127    | midnight hot light chartreuse snow       | Manufacturer#2 | Brand#23 | ECONOMY ANODIZED BRASS   | 33     | WRAP BOX    | 1202.12       | ly bold pack           |
| 175128    | green thistle navajo antique drab        | Manufacturer#4 | Brand#44 | SMALL ANODIZED STEEL     | 8      | JUMBO CAN   | 1203.12       | fix blithel            |
| 175129    | chartreuse deep mint royal violet        | Manufacturer#3 | Brand#32 | MEDIUM PLATED TIN        | 34     | JUMBO DRUM  | 1204.12       | ickly express dep      |
| 175130    | drab royal linen plum red                | Manufacturer#3 | Brand#34 | ECONOMY POLISHED COPPER  | 5      | SM BAG      | 1205.13       | uctions                |
| 175131    | plum dodger floral lawn blue             | Manufacturer#2 | Brand#23 | LARGE BRUSHED TIN        | 50     | LG PACK     | 1206.13       |  bold deposi           |
| 175132    | brown dark mint tan orchid               | Manufacturer#3 | Brand#35 | STANDARD POLISHED TIN    | 15     | WRAP BOX    | 1207.13       | excuses sublate re     |
| 175133    | orange burlywood steel turquoise wheat   | Manufacturer#3 | Brand#34 | SMALL ANODIZED COPPER    | 10     | SM PKG      | 1208.13       | e quickly final ex     |
| 175134    | dodger cyan papaya khaki sky             | Manufacturer#1 | Brand#11 | LARGE BURNISHED TIN      | 25     | SM BOX      | 1209.13       | slyly regular pint     |
| 175135    | dark linen coral pink seashell           | Manufacturer#5 | Brand#52 | STANDARD PLATED NICKEL   | 15     | JUMBO DRUM  | 1210.13       |  regularly regular pl  |
| 175136    | ghost thistle navy dark orchid           | Manufacturer#4 | Brand#44 | SMALL POLISHED NICKEL    | 41     | JUMBO CAN   | 1211.13       | ithely b               |
| 175137    | blanched drab ivory orchid sienna        | Manufacturer#2 | Brand#24 | ECONOMY BRUSHED COPPER   | 12     | WRAP DRUM   | 1212.13       | osits haggl            |
| 175138    | snow burnished thistle orange forest     | Manufacturer#1 | Brand#14 | SMALL BRUSHED BRASS      | 22     | WRAP DRUM   | 1213.13       | . carefully eve        |
| 175139    | navy seashell gainsboro chartreuse blue  | Manufacturer#2 | Brand#21 | STANDARD BURNISHED TIN   | 18     | JUMBO DRUM  | 1214.13       | lly about the s        |
| 175140    | rose slate floral orange thistle         | Manufacturer#5 | Brand#54 | ECONOMY BRUSHED BRASS    | 6      | SM CAN      | 1215.14       | efully regu            |
| 175141    | wheat puff pink olive navy               | Manufacturer#4 | Brand#43 | ECONOMY POLISHED STEEL   | 17     | SM DRUM     | 1216.14       | egular p               |
| 175142    | lace orange thistle light linen          | Manufacturer#1 | Brand#13 | PROMO PLATED STEEL       | 42     | WRAP CAN    | 1217.14       | sual ideas             |
| 175143    | slate linen puff navy gainsboro          | Manufacturer#3 | Brand#31 | LARGE PLATED TIN         | 4      | WRAP PACK   | 1218.14       | ar request             |
| 175144    | pale cyan blue thistle pink              | Manufacturer#3 | Brand#35 | ECONOMY PLATED BRASS     | 29     | LG CASE     | 1219.14       | nding instructions alo |
| 175145    | gainsboro spring magenta royal firebrick | Manufacturer#5 | Brand#52 | LARGE POLISHED STEEL     | 26     | MED PACK    | 1220.14       |  theodolites cajo      |
| 175146    | black midnight blush yellow burnished    | Manufacturer#4 | Brand#44 | ECONOMY POLISHED COPPER  | 50     | WRAP CAN    | 1221.14       | arefully regu          |
| 175147    | slate orange steel olive green           | Manufacturer#3 | Brand#34 | ECONOMY POLISHED BRASS   | 21     | MED BOX     | 1222.14       | cross the blithel      |
| 175148    | white ivory beige burnished maroon       | Manufacturer#4 | Brand#42 | SMALL PLATED TIN         | 18     | SM CASE     | 1223.14       | oxes. fluffily         |
| 175149    | almond cornflower ghost rose dodger      | Manufacturer#5 | Brand#53 | MEDIUM BRUSHED TIN       | 17     | MED PKG     | 1224.14       |  across the b          |
| 175150    | mint thistle chartreuse indian pale      | Manufacturer#4 | Brand#44 | SMALL BRUSHED STEEL      | 4      | MED CAN     | 1225.15       | odolites. fl           |
| 175151    | violet orange midnight dim peru          | Manufacturer#5 | Brand#53 | PROMO BURNISHED STEEL    | 48     | JUMBO PACK  | 1226.15       | sts sleep slyly around |
| 175152    | pink black green cyan indian             | Manufacturer#3 | Brand#33 | LARGE ANODIZED COPPER    | 37     | MED CASE    | 1227.15       | ly bol                 |
| 175153    | red orchid almond linen dark             | Manufacturer#2 | Brand#23 | STANDARD PLATED TIN      | 34     | LG CASE     | 1228.15       | nic b                  |
| 175154    | blue sienna brown indian dark            | Manufacturer#2 | Brand#23 | PROMO ANODIZED COPPER    | 36     | JUMBO JAR   | 1229.15       | ymptotes w             |
| 175155    | hot rose maroon thistle wheat            | Manufacturer#1 | Brand#13 | PROMO PLATED NICKEL      | 35     | WRAP PKG    | 1230.15       | lly final re           |
| 175156    | sienna seashell powder brown firebrick   | Manufacturer#3 | Brand#31 | PROMO POLISHED STEEL     | 26     | JUMBO DRUM  | 1231.15       | platelets.             |
| 175157    | hot black gainsboro seashell firebrick   | Manufacturer#4 | Brand#41 | SMALL BRUSHED BRASS      | 38     | JUMBO PKG   | 1232.15       | as engage quickly.     |
| 175158    | frosted midnight powder magenta chiffon  | Manufacturer#3 | Brand#35 | LARGE BURNISHED BRASS    | 26     | MED BOX     | 1233.15       | g foxes throu          |
| 175159    | misty ghost almond moccasin red          | Manufacturer#2 | Brand#22 | SMALL ANODIZED BRASS     | 15     | MED DRUM    | 1234.15       | uietly                 |
| 175160    | salmon white cornsilk frosted metallic   | Manufacturer#4 | Brand#45 | PROMO ANODIZED TIN       | 28     | JUMBO PKG   | 1235.16       | nding, slow d          |
| .                                                                                                                                                                           |
| .                                                                                                                                                                           |
| .                                                                                                                                                                           |
+-----------+------------------------------------------+----------------+----------+--------------------------+--------+-------------+---------------+------------------------+
200000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.021 seconds.

FYI @niebayes -- thank you for the report and test

@alamb alamb marked this pull request as ready for review March 29, 2025 10:28
Copy link
Owner

@clflushopt clflushopt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb
Copy link
Collaborator Author

alamb commented Mar 29, 2025

🚀

FYI @niebayes -- thanks again for the help

@alamb alamb merged commit 1783782 into clflushopt:main Mar 29, 2025
7 checks passed
@alamb alamb deleted the alamb/fix_csv branch March 29, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] datafusion-cli may fail to read csv files generated by tpchgen-cli
2 participants