Skip to content

feat(oauth): adding necessary changes to support bigquery oauth #30674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 30, 2024

Conversation

fisjac
Copy link
Contributor

@fisjac fisjac commented Oct 22, 2024

SUMMARY

Adding needed changes to support future implementation of OAuth2 functionality for BigQuery.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@@ -225,6 +225,10 @@ def ping(engine: Engine) -> bool:
# bubble up the exception to return proper status code
raise
except Exception as ex:
if database.is_oauth2_enabled() and database.db_engine_spec.needs_oauth2(
Copy link
Contributor Author

@fisjac fisjac Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQuery raises an error when running create_engine() necessitating another needs_oauth2(ex) check for the proper oauth2 exceptions to trigger the Oauth dance. Most DB's allow for an engine to be created without valid creds, and instead raises the exception on engine.connect()

@@ -1691,10 +1691,13 @@ def select_star( # pylint: disable=too-many-arguments
return sql

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]:
def estimate_statement_cost(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database is required to create the BigQuery Client when using OAuth2

@@ -351,7 +351,16 @@ def get_allow_cost_estimate(cls, extra: dict[str, Any]) -> bool:
return True

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]:
def estimate_statement_cost(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aligning other specs to have database, and adding missing docstring

@@ -365,9 +365,12 @@ def get_schema_from_engine_params(
return parse.unquote(database.split("/")[1])

@classmethod
def estimate_statement_cost(cls, statement: str, cursor: Any) -> dict[str, Any]:
def estimate_statement_cost(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding database param

effective_username,
access_token,
)
# Checking if the function signature can accept database as a param
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, here BigQuery needs the Database object to build the client. Checks if the function signature has database as a param, then passes it.

@@ -192,3 +192,4 @@ class OAuth2ClientConfigSchema(Schema):
)
authorization_request_uri = fields.String(required=True)
token_request_uri = fields.String(required=True)
project_id = fields.String(required=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigQuery takes project_id as a param in its OAuth2 parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd want to store the project id in the URI, to make it compatible with non-oauth2 use cases. And then later when you need you can grab it from database.sqlalchemy_uri.database.

@fisjac fisjac requested a review from betodealmeida October 22, 2024 17:05
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

Attention: Patch coverage is 82.05128% with 7 lines in your changes missing coverage. Please review.

Project coverage is 70.86%. Comparing base (76d897e) to head (55b4d1b).
Report is 905 commits behind head on master.

Files with missing lines Patch % Lines
...aseModal/DatabaseConnectionForm/EncryptedField.tsx 0.00% 2 Missing ⚠️
superset/db_engine_specs/bigquery.py 50.00% 2 Missing ⚠️
...eModal/DatabaseConnectionForm/CommonParameters.tsx 50.00% 1 Missing ⚠️
superset/commands/database/test_connection.py 66.66% 1 Missing ⚠️
superset/db_engine_specs/base.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #30674       +/-   ##
===========================================
+ Coverage   60.48%   70.86%   +10.37%     
===========================================
  Files        1931     1988       +57     
  Lines       76236    80300     +4064     
  Branches     8568     9151      +583     
===========================================
+ Hits        46114    56904    +10790     
+ Misses      28017    21177     -6840     
- Partials     2105     2219      +114     
Flag Coverage Δ
hive 48.89% <58.82%> (-0.27%) ⬇️
javascript 58.62% <40.00%> (+0.91%) ⬆️
mysql 76.81% <85.29%> (?)
postgres 76.90% <85.29%> (?)
presto 53.38% <58.82%> (-0.42%) ⬇️
python 83.93% <88.23%> (+20.45%) ⬆️
sqlite 76.36% <85.29%> (?)
unit 60.87% <82.35%> (+3.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@michael-s-molina michael-s-molina added review:draft review:checkpoint Last PR reviewed during the daily review standup labels Oct 22, 2024
@fisjac fisjac marked this pull request as ready for review October 22, 2024 20:09
@dosubot dosubot bot added authentication:sso Single Sign On data:connect:googlebigquery Related to BigQuery labels Oct 22, 2024
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but wondering if we need to modify _get_client here to take a database object.

@michael-s-molina michael-s-molina removed review:checkpoint Last PR reviewed during the daily review standup review:draft labels Oct 23, 2024
@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 24, 2024
@fisjac fisjac force-pushed the bigquery-oauth branch 2 times, most recently from 5872351 to 25cdd3a Compare October 24, 2024 22:25
@@ -106,6 +108,15 @@ export const OAuth2ClientField = ({ changeMethods, db }: FieldPropTypes) => {
onChange={handleChange('scope')}
/>
</FormItem>
{db.engine === Engines.BigQuery && (
<FormItem label="Project ID">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's localize the label

)
# PR #30674 changed the signature of the method to include database.
# This ensures that the change is backwards compatible
sig = signature(self.db_engine_spec.update_impersonation_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have sufficient test coverage for this case?

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I left a few comments on the handling of project_id. Happy to hop on a meeting to discuss it in more detail.


interface OAuth2ClientInfo {
id: string;
secret: string;
authorization_request_uri: string;
token_request_uri: string;
scope: string;
project_id?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is BigQuery specific, it's better to not add it here, otherwise the interface loses its purpose. Ideally we want to keep only the common information that every oauth2 client has.

Can we have this in a separate attribute instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I wanted to place this here is that, due to the flow for BigQuery, users will either add service creds, which contains the project_id, or they will add OAuth creds, which also contains the project_id. I agree in theory we should have a separate field, but that possibly gets even messier when you can have a project_id that can differ from the one used in the service creds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisjac but the OAuth2ClientInfo component is designed as a generic component to be used by all databases that support OAuth2, and should contain only the information that is specific to OAuth2.

if this were a small project it could be OK, but what's inevitably going to happen is that people are going to start adding more non-OAuth2 related fields here, and more if (db.engine === Engines.Foo) snippets, and the dynamic form builder is going to get even more messier than it already is.

Also, from a UX perspective, it's confusing to ask for a BigQuery project name in a panel called "OAuth2 client information".

Can't you just add a field for the project name to the BigQuery dynamic form? It would still be relevant regardless of if we're using OAuth2 of not, and would actually work as a nice sanity check in case someone adds the wrong credentials — we could check that the name in the project is equal to the provided project name.

@@ -192,3 +192,4 @@ class OAuth2ClientConfigSchema(Schema):
)
authorization_request_uri = fields.String(required=True)
token_request_uri = fields.String(required=True)
project_id = fields.String(required=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd want to store the project id in the URI, to make it compatible with non-oauth2 use cases. And then later when you need you can grab it from database.sqlalchemy_uri.database.

@github-actions github-actions bot added the api Related to the REST API label Oct 28, 2024
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, but I have to push back on adding project name/ID to the OAuth2 client information. Can we try adding a field for it?


interface OAuth2ClientInfo {
id: string;
secret: string;
authorization_request_uri: string;
token_request_uri: string;
scope: string;
project_id?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisjac but the OAuth2ClientInfo component is designed as a generic component to be used by all databases that support OAuth2, and should contain only the information that is specific to OAuth2.

if this were a small project it could be OK, but what's inevitably going to happen is that people are going to start adding more non-OAuth2 related fields here, and more if (db.engine === Engines.Foo) snippets, and the dynamic form builder is going to get even more messier than it already is.

Also, from a UX perspective, it's confusing to ask for a BigQuery project name in a panel called "OAuth2 client information".

Can't you just add a field for the project name to the BigQuery dynamic form? It would still be relevant regardless of if we're using OAuth2 of not, and would actually work as a nice sanity check in case someone adds the wrong credentials — we could check that the name in the project is equal to the provided project name.

@@ -1341,6 +1342,7 @@ def validate_sql(self, pk: int) -> FlaskResponse:
return self.response_404()

@expose("/oauth2/", methods=["GET"])
@transaction()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Does this work even though the method is a GET? I remember you mentioning that it was not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the transaction decorator works. It's necessary here because it's a GET request. in POST and PUT requests flask-appbuilder automatically runs a session.commit which is why they are not needed for the other standard endpoints. It's rare that we are persisting changes to the DB on a GET request, but necessary in this case.

Comment on lines +50 to +54
from superset.databases.types import ( # pylint:disable=unused-import
EncryptedDict, # noqa: F401
EncryptedField,
EncryptedString, # noqa: F401
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If EncryptedDict and EncryptedString are not used, why are you importing them?

Suggested change
from superset.databases.types import ( # pylint:disable=unused-import
EncryptedDict, # noqa: F401
EncryptedField,
EncryptedString, # noqa: F401
)
from superset.databases.types import EncryptedField

Copy link
Contributor Author

@fisjac fisjac Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are being imported by other db engine specs that point to databases.schemas . Rather than change all of the references in every spec with encrypted extra, I thought it would be cleaner to still point to the Schemas, which is where they really should be located were it not for app_context issues.

# This ensures that the change is backwards compatible
sig = signature(func)
if "database" in (params := sig.parameters.keys()):
args.insert(list(params).index("database"), self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is much more robust than my insert(0, ...) suggestion.

@@ -432,6 +432,31 @@ def test_get_sqla_engine_user_impersonation(mocker: MockerFixture) -> None:
)


def test_add_database_to_signature():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@fisjac fisjac merged commit 849d426 into apache:master Oct 30, 2024
38 checks passed
nyohasstium pushed a commit to Webgains/superset that referenced this pull request Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API authentication:sso Single Sign On data:connect:googlebigquery Related to BigQuery size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants