Skip to content

Add ability to read double-quoted csvs to wr.catalog.create_csv_table() #672

Closed
@gballardin

Description

@gballardin

I am creating a table off some csv files in S3 that I generated in a previous step in Athena with aws wrangler. Athena by default double quotes its csv output.

When I create a table in the Glue catalog with wr.catalog.create_csv_table() on the double-quoted output data I mentioned above there is no way to pass that function the WITH SERDEPROPERTIES ('quoteChar' = '\"') parameter. As a results, my table exists (I can see it listed in my Athena tables), has the correct partitions, but returns nothing when I query it with Athena. This seems to be a conceptually similar problem to this other aws wrnagler issue where {'skip.header.line.count': "1"} was missing and was later (v. 1.7.0) added on to the wr.catalog.create_csv_table() signature as a one off change.

Can you please add the ability to pass some {key: value} pairs for the SERDEPROPERTIES to this function? That way this function would replicate what

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES ('quoteChar' = '\\"')

does when creating a table with a CREATE TABLE statement in Athena.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions