Skip to content

Unable to write SUPER type column to Redshift using redshift.copy #3095

Closed
@duarteocarmo

Description

@duarteocarmo

Describe the bug

Description

Having issues writing data to a Redshift table containing a SUPER type column using awswrangler.redshift.copy. Even with serialize_to_json=True, the SUPER type column is not properly handled.

Environment

  • awswrangler version: 3.11.0
  • Python version: 3.12.4
  • Operating System: Darwin

Table Schema

CREATE TABLE IF NOT EXISTS bss_dv.free_text_translation
(
    md5_hash VARCHAR(32) NOT NULL ENCODE RAW,
    translation SUPER ENCODE RAW,
    translation_date TIMESTAMP WITHOUT TIME ZONE ENCODE az64
)
DISTSTYLE KEY
DISTKEY (md5_hash)
SORTKEY (md5_hash)

My function

def write_table(
    table: pl.DataFrame,
    config: JobConfig,
    dest_table: str = "free_text_translation",
    dest_schema: str = "bss_dv",
) -> None:
    pdf = table.to_pandas()
    pdf["translation_date"] = pd.to_datetime(pdf["translation_date"])
    # example data from here

    with wr.redshift.connect(
        secret_id=config.REDSHIFT_SECRET_ID,
        dbname=config.REDSHIFT_DB,
        timeout=3600
    ) as con:
        wr.redshift.copy(
            df=pdf,
            path=config.TEMP_BUCKET,
            con=con,
            table="free_text_translation",
            schema="bss_dv",
            mode="append",
            serialize_to_json=True,
        )

Example data:

{
    'md5_hash': '00066f9e57abf748924808004fc504a7',
    'translation': '{"text": "...", "source_language": "de", "target_language": "en", "success": true, "position": 0, "should_translate": true, "translated_text": "...", "message": null, "version": "2025-01-06", "text_len_chars": 235, "manual": false}',
    'translation_date': Timestamp('2025-02-14 13:00:59')
}

When I query the data in redshift - the translation column is a string rather than SuperJson..

How to Reproduce

See above.

Expected behavior

The col in Redshift should be superjson

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.12.4

AWS SDK for pandas version

3.11.0

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions