-
Notifications
You must be signed in to change notification settings - Fork 706
Append incremental n to duplicate cols recursively #1124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Static checking has failed. Excerpt from the build logs:
Please run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left a few comments.
Note: You can run the ./validate.sh
script in the root locally to capture static check errors (isort, black, doc8...) before pushing to the repo
tests/test_athena.py
Outdated
@@ -246,6 +246,12 @@ def test_athena_read_list(glue_database): | |||
wr.athena.read_sql_query(sql="SELECT ARRAY[1, 2, 3]", database=glue_database, ctas_approach=False) | |||
|
|||
|
|||
def test_sanitize_dataframe_column_names(): | |||
assert wr.catalog.sanitize_dataframe_columns_names(df=pd.DataFrame({'A': [1, 2]})).equals(pd.DataFrame({'a': [1, 2]})) # Unsure how to test for warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
FIxes #1119 (hopefully)
I added:
rename_duplicate_columns
which recursively appends_n
to duplicated column names.sanitize_dataframe_columns_names
which can be['warn', 'drop', 'rename']
will either leave the DF along, delete all but the first duplicated columns, or append a number to the duplicated column.athena_test.py
to test this functionality. I'm not exactly sure this is the right place but other column sanitizers were there.rename_duplicate_columns
I'm not sure if I followed how you handle warnings as I saw different syntax in other parts but it should be easy to modify if I was wrong.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.