Closed
Description
Describe the bug
Hi,
I am using the latest version of awswrangler library to extract bunch of tables from Athena. For one of my table function athena.read_sql_query fails with error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 230232: character maps to <undefined>
Here is the part of code which is giving this error:
df = wr.athena.read_sql_query(query, database=database, boto3_session=session, ctas_approach=False)
Code otherwise works fine for other tables.
Here is the detailed error trace:
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 897, in read_sql_query
return _resolve_query_without_cache(
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 519, in _resolve_query_without_cache
return _resolve_query_without_cache_regular(
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 425, in _resolve_query_without_cache_regular
return _fetch_csv_result(
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\athena\_read.py", line 161, in _fetch_csv_result
ret = s3.read_csv(
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\s3\_read_text.py", line 294, in read_csv
return _read_text(
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\s3\_read_text.py", line 149, in _read_text
ret = _read_text_file(
File "D:\PythonProjects\venvPy392\lib\site-packages\awswrangler\s3\_read_text.py", line 91, in _read_text_file
df: pd.DataFrame = parser_func(f, **pandas_kwargs)
File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
return _read(filepath_or_buffer, kwds)
File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 468, in _read
return parser.read(nrows)
File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 1057, in read
index, columns, col_dict = self._engine.read(nrows)
File "D:\PythonProjects\venvPy392\lib\site-packages\pandas\io\parsers.py", line 2036, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 827, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1943, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 230232: character maps to <undefined>
How to Reproduce
Create following view in Athena:
create view vw_error_test as
select '“Noto Emoji”' as col
Query this view using awswrangler
query = 'select * from vw_error_test'
df = wr.athena.read_sql_query(query, database=database, boto3_session=session, ctas_approach=False)
Expected behavior
No response
Your project
No response
Screenshots
No response
Environment
Provide your `pip list` output, particularly the version of the AWS Data Wrangler library you used. Providing this information may significantly improve resolution times.
OS
Windows
Python version
3.9.2
AWS DataWrangler version
2.14.0
Additional context
No response