-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(hybrid): Introduce RegionMapping method contracts #39253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Set up a type system for region objects and the set of all regions, with placeholder methods for backend operations not implemented yet.
"eu.sentry.io"). In a development environment, it may be useful to define | ||
region addresses as a custom hostname or port number (e.g, | ||
"localhost:8001"). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract for the address
attribute is a first pass and we'll probably need to iterate on it. In production, we'll definitely need to resolve regions into the subdomain where they're being hosted, and I wanted to generalize that to arbitrary hostnames and ports. But I have no idea whether it will make sense for the strings to look like "eu.sentry.io"
, "https://eu.sentry.io/"
, or whatever.
Note the assumption that the production config will be supplying a value of the form "eu.sentry.io"
rather than just "eu"
. If it's the latter, we'll additionally need to specify the "sentry.io"
part somewhere -- I'd prefer to keep that domain-name-building logic outside of the application code as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that this might need some more thought. We will need to address non-API resources across regions as well, e.g. control silo reaching out to region silo kafka clusters. These might just be better as separate explicit configurations though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subdomain or redirect target for default HTTPS might be useful in general however. This address may be like, "public endpoint"-eseque?
REGION_ID.validate(self.id) | ||
|
||
def to_url(self, path: str) -> str: | ||
return self.address + path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, this is just conceptual. We'll handle "do we need to join them with a slash" and other annoying details when the time comes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, this definitely seems more related specifically to endpoint API and not necessary the Region as a whole. I imagine there being a "SiloHTTPApiClient" that handles that and can be constructed from a Region, but not assuming there is a singular such that associated with a Region.
src/sentry/types/region.py
Outdated
_global_region_mapping = None | ||
|
||
|
||
def get_region_mapping() -> RegionMapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels ugly. I'd prefer to do something like our common "registry" pattern where the region mapping is just initialized with the module, but that leads to circular imports because sentry/conf.py
needs to import the Region
and RegionMapping
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a slightly less ugly version of this is to use functools.lru_cache(maxsize=1)
to hide the global
part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, actually getting the region with get_region_mapping().get_local_region()
is a bit wordy. Maybe if this is a class and we shorten to RegionMap
?
e.g.
class RegionMap:
def __new__(self):
return (
RegionMapping.load_from_config()
if not _global_region_mapping
else _global_region_mapping
)
but it's essentially the same thing: RegionMap().get_local_region()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about an api like,
get_local_region() -> str
get_all_regions() -> List[str]
get_region(key: str) -> Region
? I don't think we need the Mapping object directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IE: Essentially flatten the mapping object into module level functions and don't expose it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a public API made of functions feels better than a global API of shared global data. I recognize that the functions are just wrapping the global state. But the functions could be cloning data so that the global state can't be mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module-level interfaces make my Java brain sad but I see how it's nicer to abstract away the mapping object. 😅
Part of the reason I didn't like that originally, was that override_settings
wouldn't play nice with having the regions cached. (I don't believe the caching is premature optimization, because the mapping could otherwise produce O(n2) costs once we have a growing number of single-tenant regions.) In addition to injecting the region values, you would have to mess around with mocks in the sentry.types.region
module, which usually would be two steps removed from whatever you're trying to test, as is the case in test_snowflake
.
My solution is a helper named override_region
in testutils
, which hides away the module-level mocking for reuse in other tests. I think the current test_snowflake
implementation works out pretty nicely, where it only has to inject SENTRY_REGION
but doesn't have to directly mock any functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also possible to "clear" this cached value via a pytest after hook that can be installed globally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without using a more sophisticated cache than functools.lru_cache
, as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directionally this is looking good 👍 , thanks for putting this together
"eu.sentry.io"). In a development environment, it may be useful to define | ||
region addresses as a custom hostname or port number (e.g, | ||
"localhost:8001"). | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that this might need some more thought. We will need to address non-API resources across regions as well, e.g. control silo reaching out to region silo kafka clusters. These might just be better as separate explicit configurations though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 Looks good! I appreciate how documented and clear the API is. I cloned and spun this up locally too with no issues!
src/sentry/types/region.py
Outdated
_global_region_mapping = None | ||
|
||
|
||
def get_region_mapping() -> RegionMapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, actually getting the region with get_region_mapping().get_local_region()
is a bit wordy. Maybe if this is a class and we shorten to RegionMap
?
e.g.
class RegionMap:
def __new__(self):
return (
RegionMapping.load_from_config()
if not _global_region_mapping
else _global_region_mapping
)
but it's essentially the same thing: RegionMap().get_local_region()
src/sentry/utils/snowflake.py
Outdated
segment_values[VERSION_ID] = msb_0_ordering(settings.SNOWFLAKE_VERSION_ID, VERSION_ID.length) | ||
|
||
local_region = get_region_mapping().get_local_region() | ||
segment_values[REGION_ID] = local_region.id if local_region else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we lock this 0
in a constant to denote a 'default region'? Just thinking that when snowflake IDs are rolled out into prod, we'd want 0 to map the the main US region, if we haven't set up the region config in SaaS yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can think of it as 'default region' or 'unregioned' though, I think we should avoid SnowflakeIDs if we haven't set the region config just yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed elsewhere, I prefer not to represent a "default region" when what we really mean is "there are no regions in my entire system (which may be a self-hosted monolith)".
I think we should avoid SnowflakeIDs if we haven't set the region config just yet.
The plan is to have snowflake IDs be used for all organizations and projects, for the sake of parity, regardless of whether the system is multi-region. So it would be normal for future self-hosted systems to be generating snowflake IDs with zeroes where the region IDs would be if they had regions. (CC @maxiuyuan)
In contrast to not wanting a region object as a "default value", the snowflake IDs force us to represent both the "region" and "no-region" cases with numbers. Therefore, I agree with reserving 0 for "unregioned" and documenting it in code (and error-checking it in Region.__init__
) . We'd then count actual regions from 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and did this in b52156c. Let me know if anyone feels strongly about the North American region being 0.
@@ -107,6 +107,7 @@ files = src/sentry/analytics/, | |||
src/sentry/tasks/symbolication.py, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting, so to be clear do all type-checked files need to be included here?
src/sentry/types/region.py
Outdated
# For now, assume that all region configs can be taken in through Django | ||
# settings. We may investigate other ways of delivering those configs in | ||
# production. | ||
return cls(settings.SENTRY_REGION_CONFIG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
tests/sentry/utils/test_snowflake.py
Outdated
@@ -57,3 +63,27 @@ def test_out_of_region_sequences(self): | |||
generate_snowflake_id("test_redis_key") | |||
|
|||
assert str(context.value) == "No available ID" | |||
|
|||
@freeze_time(CURRENT_TIME) | |||
@mock.patch("sentry.utils.snowflake.get_region_mapping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using with override_settings
as the test target? Helps test deeper the load from settings mechanism as well. Minor nit though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned here, override_settings
isn't enough by itself, but I think the current implementation is as good.
Thank you for getting this started, honestly I can imagine using this framework for features in the near future. |
src/sentry/types/region.py
Outdated
_global_region_mapping = None | ||
|
||
|
||
def get_region_mapping() -> RegionMapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a public API made of functions feels better than a global API of shared global data. I recognize that the functions are just wrapping the global state. But the functions could be cloning data so that the global state can't be mutated.
This reverts commit a7d5d84. Co-authored-by: anthony.sottile via Slack <[email protected]>
Set up a type system for region objects and the set of all regions, with placeholders for backend operations not implemented yet. Remaking changes from #39253, which was reverted due to an error now fixed by getsentry/getsentry#8407
Set up a type system for region objects and the set of all regions, with placeholders for backend operations not implemented yet.
This reverts commit a7d5d84. Co-authored-by: anthony.sottile via Slack <[email protected]>
Set up a type system for region objects and the set of all regions, with placeholders for backend operations not implemented yet. Remaking changes from #39253, which was reverted due to an error now fixed by getsentry/getsentry#8407
Set up a type system for region objects and the set of all regions, with placeholder methods for backend operations not implemented yet. These methods should serve as temporary interface contracts to unblock development on application code that would otherwise depend on those operations.
Specifically, the contracts cover:
For the first practical use, I've dropped in a simple call to the second one, replacing some older placeholder code around "snowflake" ID generation.