-
Notifications
You must be signed in to change notification settings - Fork 15.1k
feat(oauth): adding necessary changes to support bigquery oauth #30674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -225,6 +225,10 @@ def ping(engine: Engine) -> bool: | |||
# bubble up the exception to return proper status code | |||
raise | |||
except Exception as ex: | |||
if database.is_oauth2_enabled() and database.db_engine_spec.needs_oauth2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigQuery raises an error when running create_engine()
necessitating another needs_oauth2(ex)
check for the proper oauth2 exceptions to trigger the Oauth dance. Most DB's allow for an engine to be created without valid creds, and instead raises the exception on engine.connect()
@@ -1691,10 +1691,13 @@ def select_star( # pylint: disable=too-many-arguments | |||
return sql | |||
|
|||
@classmethod | |||
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]: | |||
def estimate_statement_cost( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database is required to create the BigQuery Client when using OAuth2
@@ -351,7 +351,16 @@ def get_allow_cost_estimate(cls, extra: dict[str, Any]) -> bool: | |||
return True | |||
|
|||
@classmethod | |||
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]: | |||
def estimate_statement_cost( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aligning other specs to have database, and adding missing docstring
@@ -365,9 +365,12 @@ def get_schema_from_engine_params( | |||
return parse.unquote(database.split("/")[1]) | |||
|
|||
@classmethod | |||
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]: | |||
def estimate_statement_cost( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding database param
superset/models/core.py
Outdated
effective_username, | ||
access_token, | ||
) | ||
# Checking if the function signature can accept database as a param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, here BigQuery needs the Database object to build the client. Checks if the function signature has database
as a param, then passes it.
superset/utils/oauth2.py
Outdated
@@ -192,3 +192,4 @@ class OAuth2ClientConfigSchema(Schema): | |||
) | |||
authorization_request_uri = fields.String(required=True) | |||
token_request_uri = fields.String(required=True) | |||
project_id = fields.String(required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigQuery takes project_id
as a param in its OAuth2
parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we'd want to store the project id in the URI, to make it compatible with non-oauth2 use cases. And then later when you need you can grab it from database.sqlalchemy_uri.database
.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #30674 +/- ##
===========================================
+ Coverage 60.48% 70.86% +10.37%
===========================================
Files 1931 1988 +57
Lines 76236 80300 +4064
Branches 8568 9151 +583
===========================================
+ Hits 46114 56904 +10790
+ Misses 28017 21177 -6840
- Partials 2105 2219 +114
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but wondering if we need to modify _get_client
here to take a database object.
11f631e
to
96bcb1c
Compare
5872351
to
25cdd3a
Compare
@@ -106,6 +108,15 @@ export const OAuth2ClientField = ({ changeMethods, db }: FieldPropTypes) => { | |||
onChange={handleChange('scope')} | |||
/> | |||
</FormItem> | |||
{db.engine === Engines.BigQuery && ( | |||
<FormItem label="Project ID"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's localize the label
superset/models/core.py
Outdated
) | ||
# PR #30674 changed the signature of the method to include database. | ||
# This ensures that the change is backwards compatible | ||
sig = signature(self.db_engine_spec.update_impersonation_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have sufficient test coverage for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I left a few comments on the handling of project_id
. Happy to hop on a meeting to discuss it in more detail.
|
||
interface OAuth2ClientInfo { | ||
id: string; | ||
secret: string; | ||
authorization_request_uri: string; | ||
token_request_uri: string; | ||
scope: string; | ||
project_id?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is BigQuery specific, it's better to not add it here, otherwise the interface loses its purpose. Ideally we want to keep only the common information that every oauth2 client has.
Can we have this in a separate attribute instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I wanted to place this here is that, due to the flow for BigQuery, users will either add service creds, which contains the project_id
, or they will add OAuth creds, which also contains the project_id
. I agree in theory we should have a separate field, but that possibly gets even messier when you can have a project_id
that can differ from the one used in the service creds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fisjac but the OAuth2ClientInfo
component is designed as a generic component to be used by all databases that support OAuth2, and should contain only the information that is specific to OAuth2.
if this were a small project it could be OK, but what's inevitably going to happen is that people are going to start adding more non-OAuth2 related fields here, and more if (db.engine === Engines.Foo)
snippets, and the dynamic form builder is going to get even more messier than it already is.
Also, from a UX perspective, it's confusing to ask for a BigQuery project name in a panel called "OAuth2 client information".
Can't you just add a field for the project name to the BigQuery dynamic form? It would still be relevant regardless of if we're using OAuth2 of not, and would actually work as a nice sanity check in case someone adds the wrong credentials — we could check that the name in the project is equal to the provided project name.
superset/utils/oauth2.py
Outdated
@@ -192,3 +192,4 @@ class OAuth2ClientConfigSchema(Schema): | |||
) | |||
authorization_request_uri = fields.String(required=True) | |||
token_request_uri = fields.String(required=True) | |||
project_id = fields.String(required=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally we'd want to store the project id in the URI, to make it compatible with non-oauth2 use cases. And then later when you need you can grab it from database.sqlalchemy_uri.database
.
25cdd3a
to
6e42328
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, but I have to push back on adding project name/ID to the OAuth2 client information. Can we try adding a field for it?
|
||
interface OAuth2ClientInfo { | ||
id: string; | ||
secret: string; | ||
authorization_request_uri: string; | ||
token_request_uri: string; | ||
scope: string; | ||
project_id?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fisjac but the OAuth2ClientInfo
component is designed as a generic component to be used by all databases that support OAuth2, and should contain only the information that is specific to OAuth2.
if this were a small project it could be OK, but what's inevitably going to happen is that people are going to start adding more non-OAuth2 related fields here, and more if (db.engine === Engines.Foo)
snippets, and the dynamic form builder is going to get even more messier than it already is.
Also, from a UX perspective, it's confusing to ask for a BigQuery project name in a panel called "OAuth2 client information".
Can't you just add a field for the project name to the BigQuery dynamic form? It would still be relevant regardless of if we're using OAuth2 of not, and would actually work as a nice sanity check in case someone adds the wrong credentials — we could check that the name in the project is equal to the provided project name.
@@ -1341,6 +1342,7 @@ def validate_sql(self, pk: int) -> FlaskResponse: | |||
return self.response_404() | |||
|
|||
@expose("/oauth2/", methods=["GET"]) | |||
@transaction() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Does this work even though the method is a GET
? I remember you mentioning that it was not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the transaction decorator works. It's necessary here because it's a GET
request. in POST
and PUT
requests flask-appbuilder
automatically runs a session.commit
which is why they are not needed for the other standard endpoints. It's rare that we are persisting changes to the DB on a GET
request, but necessary in this case.
from superset.databases.types import ( # pylint:disable=unused-import | ||
EncryptedDict, # noqa: F401 | ||
EncryptedField, | ||
EncryptedString, # noqa: F401 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If EncryptedDict
and EncryptedString
are not used, why are you importing them?
from superset.databases.types import ( # pylint:disable=unused-import | |
EncryptedDict, # noqa: F401 | |
EncryptedField, | |
EncryptedString, # noqa: F401 | |
) | |
from superset.databases.types import EncryptedField |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are being imported by other db engine specs that point to databases.schemas
. Rather than change all of the references in every spec with encrypted extra, I thought it would be cleaner to still point to the Schemas, which is where they really should be located were it not for app_context issues.
# This ensures that the change is backwards compatible | ||
sig = signature(func) | ||
if "database" in (params := sig.parameters.keys()): | ||
args.insert(list(params).index("database"), self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is much more robust than my insert(0, ...)
suggestion.
@@ -432,6 +432,31 @@ def test_get_sqla_engine_user_impersonation(mocker: MockerFixture) -> None: | |||
) | |||
|
|||
|
|||
def test_add_database_to_signature(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
841481e
to
4cc90fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
SUMMARY
Adding needed changes to support future implementation of OAuth2 functionality for BigQuery.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION