Description
At the moment awswrangler.quicksight.create_athena_dataset
assigns the default name to SQL query - 'CustomSQL'. In this setup sql_name
argument becomes optional, and at least I'd expect that if I don't care about sql_name
as a user, the library will handle the name for me in a way that there won't be any problems.
In reality, if one creates several datasets with the same name for SQL query, it causes validation conflicts in AWS such as “Custom SQL tables cannot have the same alias but duplicates were found. All table names: \“CustomSQL\“”. It becomes very annoying because both awswrangler and QuickSight allow to create the dataset but then AWS doesn't allow some operations on it, for example joining it with other datasets. It becomes even worse when QuickSight doesn't show the reason of an error. Sometimes it just says that "something is wrong", and then one has to investigate, what's going on.
To avoid ambiguity in expectations, I see a couple of options:
- Remove the default 'CustomSQL' name for the
sql_name
argument, and specify in documentation that it has to be unique for everything to work correctly. - Fail fast when creating dataset with duplicated
sql_name
and explicitly tell the user that it's not going to work. - Make
sql_name
"truly" not important by randomizing the 'default' name to avoid conflicts.
I see pros and cons for every option.
Removing the default 'CustomSQL' name is a breaking change, if smb didn't use it before (I doubt that it's possible for extensive use but still) they'll have to update the code.
Fail fast might become complex, since we'd need to check wether a SQL with the same name already exists. Also I don't see uniqueness requirement in AWS documentation, which makes the responsibilities of awswrangler bigger if we want it to handle this case.
The third option is tempting, it doesn't make additional inference about QuickSight behaviour while making names slightly random and unique. But then I'm not quite sure what the best practices for this kind of randomization, and wouldn't it just make already implicit issue even more implicit.
I appreciate your thoughts, and any other suggestions. I can try to implement the change if decision will be made.