Skip to content

Add generate_create_query to Athena for generating tables' and views' DDL query #1514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 15, 2022
Merged

Conversation

KhueNgocDang
Copy link
Contributor

@KhueNgocDang KhueNgocDang commented Aug 14, 2022

Feature or Bugfix

  • Feature

Detail

  • Add generate_create_query for the Athena module to generate Athena View or Athena Table DDL query.
    • The function uses the Glue client instead of Athena's to generate the DDL.
    • Athena's SHOW CREATE TABLE returns faulty queries if special characters exist in the create query. Generate the create queries through generate_create_query will keep everything intact.

Initial create query

CREATE EXTERNAL TABLE `awswrangler_test_comment`(
  `random_japanese_string` string COMMENT 'らんじゅうにほんご(特殊ブラケット)',
  `random_russian_string` string COMMENT 'случайная русская строка',
  `random_chinese_string` string COMMENT '随机中文字符串',
  `random_vietnamese_string` string COMMENT 'chuỗi tiếng việt ngẫu nhiên')
PARTITIONED BY (
  `par0` bigint,
  `par1` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://edgar-khuedn/test_wr/'
TBLPROPERTIES (
  'classification'='parquet',
  'compressionType'='snappy',
  'projection.enabled'='false',
  'typeOfData'='file')

Results from SHOW CREATE TABLE awswrangler_test_comment;

Time in queue:
305 ms
Run time:
732 ms
Data scanned:
-
CREATE EXTERNAL TABLE `awswrangler_test_comment`(
  `random_japanese_string` string COMMENT '��X�Fk{�T�y�����
  `random_russian_string` string COMMENT 'A;CG09=0O @CAA:0O AB@>:0', 
  `random_chinese_string` string COMMENT '�:-�W&2', 
  `random_vietnamese_string` string COMMENT 'chu�i ti�ng vi�t ng�u nhi�n')
PARTITIONED BY ( 
  `par0` bigint, 
  `par1` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://edgar-khuedn/test_wr'
TBLPROPERTIES (
  'classification'='parquet', 
  'compressionType'='snappy', 
  'projection.enabled'='false', 
  'transient_lastDdlTime'='1660516666', 
  'typeOfData'='file')

Results from generate_create_query

>>> import awswrangler as wr
>>> query: str = wr.athena.generate_create_query(table="awswrangler_test_comment",database="mart")
>>> print(query)
CREATE EXTERNAL TABLE `awswrangler_test_comment`(
  `random_japanese_string` string COMMENT 'らんじゅうにほんご(特殊ブラケット)', 
  `random_russian_string` string COMMENT 'случайная русская строка', 
  `random_chinese_string` string COMMENT '随机中文字符串', 
  `random_vietnamese_string` string COMMENT 'chuỗi tiếng việt ngẫu nhiên')
PARTITIONED BY ( 
  `par0` bigint, 
  `par1` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://edgar-khuedn/test_wr'
TBLPROPERTIES (
  'classification'='parquet', 
  'compressionType'='snappy', 
  'projection.enabled'='false', 
  'transient_lastDdlTime'='1660516666', 
  'typeOfData'='file')

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 761198e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

KhueNgocDang and others added 2 commits August 14, 2022 14:06
Removed unnecessary column in `test_athena_generate_create_query`
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 2a6331c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@KhueNgocDang KhueNgocDang marked this pull request as ready for review August 14, 2022 07:27
@malachi-constant malachi-constant added feature minor release Will be addressed in the next minor release labels Aug 15, 2022
@malachi-constant malachi-constant added this to the 2.17.0 milestone Aug 15, 2022
Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 3701c36
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant malachi-constant merged commit a80d58d into aws:main Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature minor release Will be addressed in the next minor release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants