Skip to content

Config: Granular env-based Solution for Connection Strings #57

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Samk13
Copy link
Member

@Samk13 Samk13 commented Nov 26, 2024

❤️ Thank you for your contribution!

Description

Add logic to build connection URLs from env vars if available
needed for helm charts security best practices.

This PR includes:

  • Build DB URI
  • Build Redis URL
  • Build MQ URL

Partially closes: inveniosoftware/helm-invenio#112
Depends on: inveniosoftware/invenio-app-rdm#2918

Note

I created a test version encapsulated in a cached class. Let me know if you prefer this over simple functions, and I’ll update accordingly see link

Findings

  • For the database URI, ensure that SQLALCHEMY_DATABASE_URI="" is set to an empty string to allow the granular variables to take effect.

  • For the broker, I opted for the naming convention INVENIO_RABBITMQ_* instead of INVENIO_BROKER_*, as the latter conflicts with RabbitMQ's default naming variables. For more details, refer to the documentation here.

For reviewers:

I set up a testing instance to test these changes, with the .env file and docker-services modified, see: https://github.com/Samk13/invenio-dev-latest/tree/granular-env-vars-changes

  • delete the lock file, setup again, and locally install invenio-app-rdm to set these variables.

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@Samk13 Samk13 force-pushed the calc-connection-strings-util branch 2 times, most recently from 8f4b0e6 to 2e1e160 Compare November 27, 2024 16:40
@Samk13 Samk13 changed the title Config: Granular ENV-based Solution for Connection Strings Config: Granular env-based Solution for Connection Strings Nov 28, 2024
@Samk13 Samk13 force-pushed the calc-connection-strings-util branch 6 times, most recently from 5eeef92 to e52709c Compare November 28, 2024 13:39
Samk13 added a commit to Samk13/invenio-dev-latest that referenced this pull request Nov 29, 2024
@Samk13 Samk13 force-pushed the calc-connection-strings-util branch from e52709c to c8d1038 Compare December 2, 2024 17:52
Samk13 added a commit to Samk13/invenio-dev-latest that referenced this pull request Dec 2, 2024
@Samk13 Samk13 force-pushed the calc-connection-strings-util branch 5 times, most recently from b9c6913 to 8859a27 Compare December 3, 2024 14:16
@Samk13 Samk13 force-pushed the calc-connection-strings-util branch from 49faf05 to f03c861 Compare January 15, 2025 12:45
@carlinmack carlinmack moved this to Triage in v13.0.0 Mar 26, 2025
@max-moser max-moser moved this from Triage to 🏗 In progress in v13.0.0 Mar 29, 2025
@max-moser
Copy link

i'll check again how the usual configuration workflow in invenio is done:

if i remember correctly, the fetching of config items from environment variables (which have the INVENIO_ prefix, as opposed to the parsed config items which don't) is a dedicated step in that process

at TUW, we used to parse env vars similarly in our invenio.cfg, but i've recently moved away from this because it had weird interactions with multiple app instances in the same process... requiring us to override env vars etc.

@max-moser
Copy link

alright, so i just did a quick sweep:

✔️ this PR is actually part of the "env" part of invenio-config

however, it looks like the INVENIO_ prefix is hard-coded here, while invenio theoretically allows it to be changed:

while it's nontrivial to override the prefix to be something other than INVENIO_ (since that's hard-coded in invenio-app), i would still prefer using the configured value rather than hard-coding.
we could also remove the fetching of unprefixed env vars, to align more with the existing workflows

(at TUW we used to also check all sorts of env vars in our invenio.cfg but that was quite messy, and we moved away from that)


that being said, i think this PR would only support the case where all of the parts are specified via env vars – is this correct?
in our setup we have a mixed approach, where the various parts are defined in different places.
e.g. the "fixed" parts are defined in invenio.cfg (or some config.py, or set by an extension, ...) while the dynamic (and/or secret) parts are set via the environment.
i think supporting such use cases would be generally beneficial

would it make sense to piece together the various connection strings in the most closely related packages' extensions on startup?
e.g. on <package_extension>.init_app(app)

@max-moser
Copy link

⚠️ 💀 CURSE WARNING CURSE WARNING 💀 ⚠️

as a bit of extra context, because i've mentioned TUW moving away from env var parsing in invenio.cfg above.
our current replacement is hooking into the app factory / startup process after loading of env vars is done, but before extensions are initialized [1]

there we perform a very similar logic to what's presented in this PR here: https://gitlab.tuwien.ac.at/crdm/invenio-config-tuw/-/blob/master/invenio_config_tuw/startup/config.py?ref_type=heads#L87-106

(please ignore the overriding of the flask config class, that's related to our use case for supporting multiple domains)

[1] some extensions like to set up their own variables (e.g. redis client) constructed from the connection string, and tracking all of those to change their connection details afterwards is simply not feasible – that's why we hooked into this particular place


because there is no dedicated spot that fulfills our requirement (there is no hook between env loading & extension init), we "simply" made sure that our Invenio-Config-TUW extension is the first one to get loaded.
because out of the box the apps are loaded in random order, we had to override (i.e. monkey-patch) the default app_loader [2]
this logic is executed by registering an entry point that gets executed at an earlier stage of the config loading [3]

[2] https://gitlab.tuwien.ac.at/crdm/invenio-config-tuw/-/blob/master/invenio_config_tuw/config.py?ref_type=heads#L389-421
[3] https://gitlab.tuwien.ac.at/crdm/invenio-config-tuw/-/blob/master/pyproject.toml?ref_type=heads#L112


you are now cursed

Samk13 added 3 commits March 30, 2025 04:01
Add logic to build connection urls from env vars if available
needed for helm charts security best practices. includes:
* Build db uri
* Build redis url
* Build mq url

Partially closes: inveniosoftware/helm-invenio#112
@Samk13 Samk13 force-pushed the calc-connection-strings-util branch from 1da2e22 to 336e9e7 Compare March 30, 2025 02:03
Copy link
Member

@slint slint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, but I am a bit hesitant about placing it in invenio-config, which so far has been dealing with "how" config is loaded, rather than "what" config is loaded.

My first thought to see where e.g. DB_HOST is defined, purely based on convention, would be invenio-db. I think it's worth to directly:

  • remove the INVENIO_-prefixing logic in each function, since it anyways gets taken into account in this file
  • move each build_* function to its relevant module under the InvenioXYZ.init_config() method of its extension. That would be:
    • build_db_uri to invenio-db
    • build_search_uri to invenio-search
    • build_broker_uri to invenio-celery
    • build_redis_uri to invenio-cache (though there is a caveat here that might require a bit more discussion, since we're depending on it also in invenio-accounts).

@max-moser
Copy link

i just submitted a PR for a slightly adapted version of the code for Invenio-DB (linked above), and will create the other PRs later.

@Samk13
Copy link
Member Author

Samk13 commented Mar 31, 2025

With @max-moser stepping in, I'll gladly step back and let you work your magic 🪄.
Thanks for the guidance, Max and Alex, I'll keep an eye out for open PRs and assist if needed.

@Samk13
Copy link
Member Author

Samk13 commented Apr 16, 2025

For context and history: following the discussion during the teleconference, we concluded with the following points, which @max-moser helpfully summarized on Discord and created issues for:

  1. Add a post-config-loading step to process connection strings before extensions initialize. Add config massaging step during configuration loading invenio-app-rdm#3028

  2. sorting of entrypoints, loading & using silly alphabetic naming hacks
    (similar to what invenio-config-tuw does)

  3. Add another source for configuration, like external YAML files
    similar to: https://github.com/oarepo/oarepo/blob/rdm-12/oarepo/config/base.py#L106
    avoids reliance on environment variables, which could get exposed (anecdote: PHP debug page printed env vars)
    @mirek.simek uses this in their invenio.cfg, so that it doesn't require an upstream change for them
    Provide functionality to load configuration from further places invenio-app-rdm#3029

  4. Make it the responsibility of some extensions to check if the full connection string is configured already
    and if not, cobble it together from the parts and basically call app.config.setdefault(NAME, VALUE)
    -> this requires removing some fallback/default values from invenio_app_rdm/config.py
    @guillaume

  5. Refuse to startup if some settings are not present Add sanity checks for configuration on startup invenio-app-rdm#3030

  6. Have parts take precedence over the built connection strings
    (Otherwise, the peculiarities of invenio w.r.t. startup may cause pain; at least it did for us at TUW in our experiments)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

Granular env-based solution for "connection string"-like config
5 participants