Skip to content

Quicksight create_athena_dataset method has a default name for SQL query which breaks the functional for created datasets #1511

Closed
@AndreyKudryavets

Description

@AndreyKudryavets

At the moment awswrangler.quicksight.create_athena_dataset assigns the default name to SQL query - 'CustomSQL'. In this setup sql_name argument becomes optional, and at least I'd expect that if I don't care about sql_name as a user, the library will handle the name for me in a way that there won't be any problems.

In reality, if one creates several datasets with the same name for SQL query, it causes validation conflicts in AWS such as “Custom SQL tables cannot have the same alias but duplicates were found. All table names: \“CustomSQL\“”. It becomes very annoying because both awswrangler and QuickSight allow to create the dataset but then AWS doesn't allow some operations on it, for example joining it with other datasets. It becomes even worse when QuickSight doesn't show the reason of an error. Sometimes it just says that "something is wrong", and then one has to investigate, what's going on.

To avoid ambiguity in expectations, I see a couple of options:

  1. Remove the default 'CustomSQL' name for the sql_name argument, and specify in documentation that it has to be unique for everything to work correctly.
  2. Fail fast when creating dataset with duplicated sql_name and explicitly tell the user that it's not going to work.
  3. Make sql_name "truly" not important by randomizing the 'default' name to avoid conflicts.

I see pros and cons for every option.
Removing the default 'CustomSQL' name is a breaking change, if smb didn't use it before (I doubt that it's possible for extensive use but still) they'll have to update the code.
Fail fast might become complex, since we'd need to check wether a SQL with the same name already exists. Also I don't see uniqueness requirement in AWS documentation, which makes the responsibilities of awswrangler bigger if we want it to handle this case.
The third option is tempting, it doesn't make additional inference about QuickSight behaviour while making names slightly random and unique. But then I'm not quite sure what the best practices for this kind of randomization, and wouldn't it just make already implicit issue even more implicit.

I appreciate your thoughts, and any other suggestions. I can try to implement the change if decision will be made.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions