Releases · aws/aws-sdk-pandas

28 Mar 14:36

2.15.0

4f46c4c

AWS Data Wrangler 2.15.0

Noteworthy

⚠️ Dropped Python 3.6 support

⚠️ For platforms without PyArrow 7 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Amazon Neptune module 🚀 #1084 Check out the tutorial. Thanks to @bechbd & @sakti-mishra !
ARM64 Support for Python 3.8 and 3.9 layers 🔥 #1129 Many thanks @cnfait !

Enhancements

Timestream module - support multi-measure records #1214
Warnings for implicit float conversion of nulls in to_parquet #1221
Support additional sql params in Redshift COPY operation #1210
Add create_ctas_table to Athena module #1207
S3 Proxy support #1206
Add Athena get_named_query_statement #1183
Add manifest parameter to 'redshift.copy_from_files' method #1164

Documentation

Update install section #1242
Update lambda layers section #1236

Bug Fix

Give precedence to user path for Athena UNLOAD S3 Output Location #1216
Honor User specified workgroup in athena.read_sql_query with unload_approach=True #1178
Support map type in Redshift copy #1185
data_api.rds.read_sql_query() does not preserve data type when column is all NULLS - switches to Boolean #1158
Allow decimal values within struct when writing to parquet #1179

Thanks

We thank the following contributors/users for their work on this release:

@bechbd, @sakti-mishra, @mateogianolio, @jasadams, @malachi-constant, @cnfait, @jaidisido, @kukushking

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

mateogianolio, jasadams, and 6 other contributors

Assets 8

28 Jan 14:24

jaidisido

2.14.0

7604507

AWS Data Wrangler 2.14.0

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Support Athena Unload 🚀 #1038

Enhancements

Add the ExcludeColumnSchema=True argument to the glue.get_partitions call to reduce response size #1094
Add PyArrow flavor argument to write_parquet via pyarrow_additional_kwargs #1057
Add rename_duplicate_columns and handle_duplicate_columns flag to sanitize_dataframe_columns_names method #1124
Add timestamp_as_object argument to all database read_sql_table methods #1130
Add ignore_null to read_parquet_metadata method #1125

Documentation

Improve documentation on installing SAR Lambda layers with the CDK #1097
Fix broken link to tutorial in to_parquet method #1058

Bug Fix

Ensure that partition locations retrieved from AWS Glue always end in a "/" #1094
Fix bucketing overflow issue in Athena #1086

Thanks

We thank the following contributors/users for their work on this release:

@dennyau, @kailukowiak, @lucasmo, @moykeen, @RigoIce, @vlieven, @kepler, @mdavis-xyz, @ConstantinoSchillebeeckx, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

kepler, kukushking, and 8 other contributors

Assets 7

03 Dec 20:09

kukushking

2.13.0

0821d3d

AWS Data Wrangler 2.13.0

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Breaking changes

Fix sanitize methods to align with Glue/Hive naming conventions #579

New Functionalities

AWS Lake Formation Governed Tables 🚀 #570
Support for Python 3.10 🔥 #973
Add partitioning to JSON datasets #962
Add ability to use unbuffered cursor for large MySQL datasets #928

Enhancements

Add awswrangler.s3.list_buckets #997
Add partitions_parameters to catalog partitions methods #1035
Refactor pagination config in list objects #955
Add error message to EmptyDataframe exception #991

Documentation

Clarify docs & add tutorial on schema evolution for CSV datasets #964

Bug Fix

catalog.add_column() without column_comment triggers exception #1017
catalog.create_parquet_table Key in dictionary does not always exist #998
Fix Catalog StorageDescriptor get #969

Thanks

We thank the following contributors/users for their work on this release:

@csabz09, @Falydoor, @moritzkoerber, @maxispeicher, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

Falydoor, kukushking, and 4 other contributors

Assets 7

18 Oct 12:02

jaidisido

2.12.1

829c306

AWS Data Wrangler 2.12.1

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Patch

Removing unnecessary dev dependencies from main #961

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

13 Oct 16:32

kukushking

2.12.0

f82b7e1

AWS Data Wrangler 2.12.0

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Add Support for Opensearch #891 🔥 Check out the tutorial. Many thanks to @AssafMentzer and @mureddy19 for this contribution

Enhancements

redshift.read_sql_query - handle empty table corner case #874
Refactor read parquet table to reduce file list scan based on available partitions #878
Shrink lambda layer with strip command #884
Enabling DynamoDB endpoint URL #887
EMR jobs concurrency #889
Add feature to allow custom AMI for EMR #907
wr.redshift.unload_to_files empty the S3 folder instead of overwriting existing files #914
Add catalog_id arg to wr.catalog.does_table_exist #920
Ad enpoint_url for AWS Secrets Manager #929

Documentation

Update docs for awswrangler.s3.to_csv #868

Bug Fix

wr.mysql.to_sql with use_column_names=True when column names are reserved words #918

Thanks

We thank the following contributors/users for their work on this release:

@AssafMentzer, @mureddy19, @isichei, @DonnaArt, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

kukushking, AssafMentzer, and 4 other contributors

Assets 6

01 Sep 16:49

jaidisido

2.11.0

e216d53

AWS Data Wrangler 2.11.0

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Redshift and RDS Data Api Support #828 🚀 Check out the tutorial. Many thanks to @pwithams for this contribution

Enhancements

Upgrade to PyArrow 5 #861
Add Pagination for TimestreamDB #838

Documentation

Clarifying structure of SSM secrets in connect methods #871

Bug Fix

Use botocores' Loader and ServiceModel to extract accepted kwargs #832

Thanks

We thank the following contributors/users for their work on this release:

@pwithams, @maxispeicher, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Contributors

kukushking, jaidisido, and 2 other contributors

Assets 6

21 Jul 11:35

jaidisido

2.10.0

db1e3ef

AWS Data Wrangler 2.10.0

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Enhancements

Add upsert support for Postgresql #807
Add schema evolution parameter to wr.s3.to_csv #787
Enable order by in CTAS Athena queries #785
Add header to wr.s3.to_csv when dataset = True #765
Add CSV as unload format to wr.redshift.unload_files #761

Bug Fix

Fix deleting CTAS temporary Glue tables #782
Ensure safe get of Glue table parameters #779 and #783

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @jaidisido, @mohdaliiqbal

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

18 Jun 13:15

kukushking

2.9.0

89b459d

AWS Data Wrangler 2.9.0

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Added S3 Select tutorial #748
Clarified wr.s3.to_csv docs #730

Enhancements

Enable server-side predicate filtering using S3 Select 🚀 #678
Support VersionId parameter for S3 read operations #721
Enable prefix in output S3 files for wr.redshift.unload_to_files #729
Add option to skip commit on wr.redshift.to_sql #705
Move integration test infrastructure to CDK 🎉 #706

Bug Fix

Wait until athena query results bucket is created #735
Remove explicit Excel engine configuration #742
Fix bucketing types #719
Change end_time to UTC #720

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @jaidisido

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

19 May 13:40

jaidisido

2.8.0

b13fcd8

AWS Data Wrangler 2.8.0

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
Clarified docs around potential in-place mutation of dataframe when using to_parquet #669

Enhancements

Enable parallel s3 downloads (~20% speedup) 🚀 #644
Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
Enable LOCK before concurrent COPY calls in Redshift #665
Make use of Pyarrow iter_batches (>= 3.0.0 only) #660
Enable additional options when overwriting Redshift table (drop, truncate, cascade) #671
Reuse s3 client across threads for s3 range requests #684

Bug Fix

Add dtypes for empty ctas athena queries #659
Add Serde properties when creating CSV table #672
Pass SSL properties from Glue Connection to MySQL #554

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Assets 6

15 Apr 17:17

jaidisido

2.7.0

fd1b62f

AWS Data Wrangler 2.7.0

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Updated documentation to clarify wr.athena.read_sql_query params argument use #609

New Functionalities

Supporting MySQL upserts #608
Enable prepending S3 parquet files with a prefix in wr.s3.write.to_parquet #617
Add exist_ok flag to safely create a Glue database #642
Add "Unsupported Pyarrow type" exception #639

Bug Fix

Fix chunked mode in wr.s3.read_parquet_table #627
Fix missing \ character from wr.s3.read_parquet_table method #638
Support postgres as an engine value #630
Add default workgroup result configuration #633
Raise exception when merge_upsert_table fails or data_quality is insufficient #601
Fixing nested structure bug in athena2pyarrow method #612

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @mattboyd-aws, @vlieven, @bentkibler, @adarsh-chauhan, @impredicative, @nmduarteus, @JoshCrosby, @TakumiHaruta, @zdk123, @tuannguyen0901, @jiteshsoni, @luminita.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

Assets 6

Releases: aws/aws-sdk-pandas

AWS Data Wrangler 2.15.0

Noteworthy

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

Uh oh!

AWS Data Wrangler 2.14.0

Caveats

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

Uh oh!

AWS Data Wrangler 2.13.0

Caveats

Breaking changes

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

Uh oh!

AWS Data Wrangler 2.12.1

Caveats

Patch

Uh oh!

AWS Data Wrangler 2.12.0

Caveats

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

Uh oh!

AWS Data Wrangler 2.11.0

Caveats

New Functionalities

Enhancements

Documentation

Bug Fix

Thanks

Contributors

Uh oh!

AWS Data Wrangler 2.10.0

Caveats

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.9.0

Caveats

Documentation

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.8.0

Caveats

Documentation

Enhancements

Bug Fix

Thanks

Uh oh!

AWS Data Wrangler 2.7.0

Caveats

Documentation

New Functionalities

Bug Fix

Thanks

Uh oh!