Skip to content

feat(hybrid): Introduce RegionMapping method contracts #39253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 27, 2022

Conversation

RyanSkonnord
Copy link
Contributor

Set up a type system for region objects and the set of all regions, with placeholder methods for backend operations not implemented yet. These methods should serve as temporary interface contracts to unblock development on application code that would otherwise depend on those operations.

Specifically, the contracts cover:

  • injecting the list of all regions from config (or whatever operational source of truth we settle on);
  • looking up what region I'm in if I'm a region silo; and
  • resolving an organization to its home region.

For the first practical use, I've dropped in a simple call to the second one, replacing some older placeholder code around "snowflake" ID generation.

Set up a type system for region objects and the set of all regions, with
placeholder methods for backend operations not implemented yet.
@RyanSkonnord RyanSkonnord requested a review from a team as a code owner September 23, 2022 18:50
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Sep 23, 2022
"eu.sentry.io"). In a development environment, it may be useful to define
region addresses as a custom hostname or port number (e.g,
"localhost:8001").
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract for the address attribute is a first pass and we'll probably need to iterate on it. In production, we'll definitely need to resolve regions into the subdomain where they're being hosted, and I wanted to generalize that to arbitrary hostnames and ports. But I have no idea whether it will make sense for the strings to look like "eu.sentry.io", "https://eu.sentry.io/", or whatever.

Note the assumption that the production config will be supplying a value of the form "eu.sentry.io" rather than just "eu". If it's the latter, we'll additionally need to specify the "sentry.io" part somewhere -- I'd prefer to keep that domain-name-building logic outside of the application code as much as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that this might need some more thought. We will need to address non-API resources across regions as well, e.g. control silo reaching out to region silo kafka clusters. These might just be better as separate explicit configurations though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subdomain or redirect target for default HTTPS might be useful in general however. This address may be like, "public endpoint"-eseque?

REGION_ID.validate(self.id)

def to_url(self, path: str) -> str:
return self.address + path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this is just conceptual. We'll handle "do we need to join them with a slash" and other annoying details when the time comes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this definitely seems more related specifically to endpoint API and not necessary the Region as a whole. I imagine there being a "SiloHTTPApiClient" that handles that and can be constructed from a Region, but not assuming there is a singular such that associated with a Region.

_global_region_mapping = None


def get_region_mapping() -> RegionMapping:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels ugly. I'd prefer to do something like our common "registry" pattern where the region mapping is just initialized with the module, but that leads to circular imports because sentry/conf.py needs to import the Region and RegionMapping types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a slightly less ugly version of this is to use functools.lru_cache(maxsize=1) to hide the global part

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, actually getting the region with get_region_mapping().get_local_region() is a bit wordy. Maybe if this is a class and we shorten to RegionMap?
e.g.

class RegionMap:
    def __new__(self):
        return (
            RegionMapping.load_from_config()
            if not _global_region_mapping
            else _global_region_mapping
        )

but it's essentially the same thing: RegionMap().get_local_region()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an api like,

get_local_region() -> str
get_all_regions() -> List[str]
get_region(key: str) -> Region

? I don't think we need the Mapping object directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE: Essentially flatten the mapping object into module level functions and don't expose it directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a public API made of functions feels better than a global API of shared global data. I recognize that the functions are just wrapping the global state. But the functions could be cloning data so that the global state can't be mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level interfaces make my Java brain sad but I see how it's nicer to abstract away the mapping object. 😅

Part of the reason I didn't like that originally, was that override_settings wouldn't play nice with having the regions cached. (I don't believe the caching is premature optimization, because the mapping could otherwise produce O(n2) costs once we have a growing number of single-tenant regions.) In addition to injecting the region values, you would have to mess around with mocks in the sentry.types.region module, which usually would be two steps removed from whatever you're trying to test, as is the case in test_snowflake.

My solution is a helper named override_region in testutils, which hides away the module-level mocking for reuse in other tests. I think the current test_snowflake implementation works out pretty nicely, where it only has to inject SENTRY_REGION but doesn't have to directly mock any functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also possible to "clear" this cached value via a pytest after hook that can be installed globally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without using a more sophisticated cache than functools.lru_cache, as far as I know.

Copy link
Contributor

@mikejihbe mikejihbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally this is looking good 👍 , thanks for putting this together

"eu.sentry.io"). In a development environment, it may be useful to define
region addresses as a custom hostname or port number (e.g,
"localhost:8001").
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that this might need some more thought. We will need to address non-API resources across regions as well, e.g. control silo reaching out to region silo kafka clusters. These might just be better as separate explicit configurations though

Copy link
Member

@leeandher leeandher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Looks good! I appreciate how documented and clear the API is. I cloned and spun this up locally too with no issues!

_global_region_mapping = None


def get_region_mapping() -> RegionMapping:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, actually getting the region with get_region_mapping().get_local_region() is a bit wordy. Maybe if this is a class and we shorten to RegionMap?
e.g.

class RegionMap:
    def __new__(self):
        return (
            RegionMapping.load_from_config()
            if not _global_region_mapping
            else _global_region_mapping
        )

but it's essentially the same thing: RegionMap().get_local_region()

segment_values[VERSION_ID] = msb_0_ordering(settings.SNOWFLAKE_VERSION_ID, VERSION_ID.length)

local_region = get_region_mapping().get_local_region()
segment_values[REGION_ID] = local_region.id if local_region else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we lock this 0 in a constant to denote a 'default region'? Just thinking that when snowflake IDs are rolled out into prod, we'd want 0 to map the the main US region, if we haven't set up the region config in SaaS yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can think of it as 'default region' or 'unregioned' though, I think we should avoid SnowflakeIDs if we haven't set the region config just yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed elsewhere, I prefer not to represent a "default region" when what we really mean is "there are no regions in my entire system (which may be a self-hosted monolith)".

I think we should avoid SnowflakeIDs if we haven't set the region config just yet.

The plan is to have snowflake IDs be used for all organizations and projects, for the sake of parity, regardless of whether the system is multi-region. So it would be normal for future self-hosted systems to be generating snowflake IDs with zeroes where the region IDs would be if they had regions. (CC @maxiuyuan)

In contrast to not wanting a region object as a "default value", the snowflake IDs force us to represent both the "region" and "no-region" cases with numbers. Therefore, I agree with reserving 0 for "unregioned" and documenting it in code (and error-checking it in Region.__init__) . We'd then count actual regions from 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and did this in b52156c. Let me know if anyone feels strongly about the North American region being 0.

@@ -107,6 +107,7 @@ files = src/sentry/analytics/,
src/sentry/tasks/symbolication.py,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting, so to be clear do all type-checked files need to be included here?

# For now, assume that all region configs can be taken in through Django
# settings. We may investigate other ways of delivering those configs in
# production.
return cls(settings.SENTRY_REGION_CONFIG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -57,3 +63,27 @@ def test_out_of_region_sequences(self):
generate_snowflake_id("test_redis_key")

assert str(context.value) == "No available ID"

@freeze_time(CURRENT_TIME)
@mock.patch("sentry.utils.snowflake.get_region_mapping")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using with override_settings as the test target? Helps test deeper the load from settings mechanism as well. Minor nit though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned here, override_settings isn't enough by itself, but I think the current implementation is as good.

@corps
Copy link
Contributor

corps commented Sep 23, 2022

Thank you for getting this started, honestly I can imagine using this framework for features in the near future.

_global_region_mapping = None


def get_region_mapping() -> RegionMapping:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a public API made of functions feels better than a global API of shared global data. I recognize that the functions are just wrapping the global state. But the functions could be cloning data so that the global state can't be mutated.

@RyanSkonnord RyanSkonnord enabled auto-merge (squash) September 27, 2022 18:02
@RyanSkonnord RyanSkonnord merged commit a7d5d84 into master Sep 27, 2022
@RyanSkonnord RyanSkonnord deleted the region-mapping branch September 27, 2022 18:18
getsentry-bot added a commit that referenced this pull request Sep 27, 2022
This reverts commit a7d5d84.

Co-authored-by: anthony.sottile via Slack <[email protected]>
RyanSkonnord added a commit that referenced this pull request Sep 27, 2022
Set up a type system for region objects and the set of all regions, with
placeholders for backend operations not implemented yet.

Remaking changes from #39253, which was reverted due to an error
now fixed by getsentry/getsentry#8407
ryan953 pushed a commit that referenced this pull request Oct 3, 2022
Set up a type system for region objects and the set of all regions, with
placeholders for backend operations not implemented yet.
ryan953 pushed a commit that referenced this pull request Oct 3, 2022
This reverts commit a7d5d84.

Co-authored-by: anthony.sottile via Slack <[email protected]>
ryan953 pushed a commit that referenced this pull request Oct 3, 2022
Set up a type system for region objects and the set of all regions, with
placeholders for backend operations not implemented yet.

Remaking changes from #39253, which was reverted due to an error
now fixed by getsentry/getsentry#8407
@github-actions github-actions bot locked and limited conversation to collaborators Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants