Skip to content

UnicodeDecodeError when using athena.read_sql_query #1156

Closed
@Chintan-D

Description

@Chintan-D

Describe the bug

Hi,
I am using the latest version of awswrangler library to extract bunch of tables from Athena. For one of my table function athena.read_sql_query fails with error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 230232: character maps to <undefined>

Here is the part of code which is giving this error:
df = wr.athena.read_sql_query(query, database=database, boto3_session=session, ctas_approach=False)

Code otherwise works fine for other tables.

Here is the detailed error trace:


 File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 897, in read_sql_query
    return _resolve_query_without_cache(
  File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 519, in _resolve_query_without_cache
    return _resolve_query_without_cache_regular(
  File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 425, in _resolve_query_without_cache_regular
    return _fetch_csv_result(
  File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 161, in _fetch_csv_result
    ret = s3.read_csv(
  File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\s3\_read_text.py", line 294, in read_csv
    return _read_text(
  File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\s3\_read_text.py", line 149, in _read_text
    ret = _read_text_file(
  File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\s3\_read_text.py", line 91, in _read_text_file
    df: pd.DataFrame = parser_func(f, **pandas_kwargs)
  File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 468, in _read
    return parser.read(nrows)
  File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 1057, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 2036, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 827, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 1943, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 230232: character maps to <undefined>

How to Reproduce

Create following view in Athena:

create view vw_error_test as 
select '“Noto Emoji”' as col

Query this view using awswrangler

query = 'select * from vw_error_test'
df = wr.athena.read_sql_query(query, database=database, boto3_session=session, ctas_approach=False)

Expected behavior

No response

Your project

No response

Screenshots

No response

Environment

Provide your `pip list` output, particularly the version of the AWS Data Wrangler library you used. Providing this information may significantly improve resolution times.

OS

Windows

Python version

3.9.2

AWS DataWrangler version

2.14.0

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions