Quicksight create_athena_dataset method has a default name for SQL query which breaks the functional for created datasets

At the moment `awswrangler.quicksight.create_athena_dataset` assigns the default name to SQL query - 'CustomSQL'. In this setup `sql_name` argument becomes optional, and at least I'd expect that if I don't care about `sql_name` as a user, the library will handle the name for me in a way that there won't be any problems. 

In reality, if one creates several datasets with the same name for SQL query, it causes validation conflicts in AWS such as “Custom SQL tables cannot have the same alias but duplicates were found. All table names: \“CustomSQL\“”. It becomes very annoying because both awswrangler and QuickSight allow to create the dataset but then AWS doesn't allow some operations on it, for example joining it with other datasets. It becomes even worse when QuickSight doesn't show the reason of an error. Sometimes it just says that "something is wrong", and then one has to investigate, what's going on.

To avoid ambiguity in expectations, I see a couple of options:
1. Remove the default 'CustomSQL' name for the `sql_name` argument, and specify in documentation that it has to be unique for everything to work correctly.
2. Fail fast when creating dataset with duplicated `sql_name` and explicitly tell the user that it's not going to work.
3. Make `sql_name` "truly" not important by randomizing the 'default' name to avoid conflicts.

I see pros and cons for every option.
Removing the default 'CustomSQL' name is a breaking change, if smb didn't use it before (I doubt that it's possible for extensive use but still) they'll have to update the code.
Fail fast might become complex, since we'd need to check wether a SQL with the same name already exists. Also I don't see uniqueness requirement in [AWS documentation](https://docs.aws.amazon.com/quicksight/latest/APIReference/API_CustomSql.html), which makes the responsibilities of awswrangler bigger if we want it to handle this case.
The third option is tempting, it doesn't make additional inference about QuickSight behaviour while making names slightly random and unique. But then I'm not quite sure what the best practices for this kind of randomization, and wouldn't it just make already implicit issue even more implicit.

I appreciate your thoughts, and any other suggestions. I can try to implement the change if decision will be made.







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quicksight create_athena_dataset method has a default name for SQL query which breaks the functional for created datasets #1511

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quicksight create_athena_dataset method has a default name for SQL query which breaks the functional for created datasets #1511

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions