Closed
Description
Describe the bug
Description
Having issues writing data to a Redshift table containing a SUPER type column using awswrangler.redshift.copy
. Even with serialize_to_json=True
, the SUPER type column is not properly handled.
Environment
- awswrangler version: 3.11.0
- Python version: 3.12.4
- Operating System: Darwin
Table Schema
CREATE TABLE IF NOT EXISTS bss_dv.free_text_translation
(
md5_hash VARCHAR(32) NOT NULL ENCODE RAW,
translation SUPER ENCODE RAW,
translation_date TIMESTAMP WITHOUT TIME ZONE ENCODE az64
)
DISTSTYLE KEY
DISTKEY (md5_hash)
SORTKEY (md5_hash)
My function
def write_table(
table: pl.DataFrame,
config: JobConfig,
dest_table: str = "free_text_translation",
dest_schema: str = "bss_dv",
) -> None:
pdf = table.to_pandas()
pdf["translation_date"] = pd.to_datetime(pdf["translation_date"])
# example data from here
with wr.redshift.connect(
secret_id=config.REDSHIFT_SECRET_ID,
dbname=config.REDSHIFT_DB,
timeout=3600
) as con:
wr.redshift.copy(
df=pdf,
path=config.TEMP_BUCKET,
con=con,
table="free_text_translation",
schema="bss_dv",
mode="append",
serialize_to_json=True,
)
Example data:
{
'md5_hash': '00066f9e57abf748924808004fc504a7',
'translation': '{"text": "...", "source_language": "de", "target_language": "en", "success": true, "position": 0, "should_translate": true, "translated_text": "...", "message": null, "version": "2025-01-06", "text_len_chars": 235, "manual": false}',
'translation_date': Timestamp('2025-02-14 13:00:59')
}
When I query the data in redshift - the translation column is a string rather than SuperJson..
How to Reproduce
See above.
Expected behavior
The col in Redshift should be superjson
Your project
No response
Screenshots
No response
OS
Mac
Python version
3.12.4
AWS SDK for pandas version
3.11.0
Additional context
No response