-
Notifications
You must be signed in to change notification settings - Fork 132
default LBP: Warn on misconfigured local DC #1373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
35fc56e
to
084f127
Compare
Rate-limited warnings will also let the user know about the issue. They however won't totally break down the driver by logging a message for each request. |
`DefaultPolicy::pick()` begins with some sanity checks, whose goal is to warn the user about misconfiguration. Those checks are extracted into a separate method, `DefaultPolicy::pick_sanity_checks()`, so that `pick()` is decluttered. The next commits introduce more sanity checks.
This makes the next commit cleaner.
DefaultPolicy now issues respective warnings: 1. if the preferred datacenter is not present in the cluster; 2. if all nodes in the preferred datacenter are disabled by the HostFilter. This helps avoid confusion when the user expects requests to be sent to a specific datacenter, but it is not available. As warnings are going to be emitted on every request (because the configuration is most likely session-wide), the messages will quickly flood the logs and the user should notice. Alternatively, we could implement some frequency limiting of those warnings, but we rather prefer to: - keep it simple for now, - let the user notice the misconfiguration as quickly as possible.
084f127
to
e2ba54b
Compare
Rebased on main. |
Added 3 new commits:
|
e0598e2
to
d2e5fdc
Compare
RateLimiter is a new utility that allows rate-limiting arbitrary actions. It provides one central method, `try_acquire`, which checks if the action should be performed based on a specified interval. If the action has not been performed: 1. ever, or 2. in the last `interval`, it updates the last-performed timestamp and returns `true`, allowing the action to proceed. Otherwise, it returns `false`, indicating the action should be skipped due to rate limiting. This utility is designed to be efficient and thread-safe, using atomic operations to ensure that multiple threads can safely check and update the last-performed timestamp without locking. The `RateLimiter` is intended for use in scenarios where actions need to be performed at a controlled rate, such as logging, warning messages, or other operations that should not be executed too frequently to avoid spamming logs or overwhelming resources. For now, it's used to rate-limit warning messages emitted by the driver, in the implementation of `warn_rate_limited!` macro. NOTE to reviewers: this code has been generated with major help of the Claude Sonnet 4 AI model. Beware of common pitfalls of AI-generated code.
In case of misconfiguration, the default load balancing policy can throw a lot of warnings, which flood the logs and overwhelm the machine. To prevent this, we rate limit the warnings there to one per second. NOTE: each warning is rate limited separately, so if there are multiple misconfigurations, each will still be logged.
There were some more warnings that were not rate-limited, which could flood the logs in case of misconfiguration. This commit adds rate limiting there, too.
d2e5fdc
to
1458c5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API of the newly introduced RateLimiter could be better. My idea:
- Duration should be stored inside the limiter, because I find it unlikely to that we'll need to use multiple durations with single limiter.
- Instead of having a method returning bool, we can have
maybe_execute
taking a closure that will be called if timer passed. - After the above, the macros won't really give much value over having the limiter stored explicitly inside DefaultPolicy (or having it explicitly as a static). Let's avoid macros unless they really do help in something.
- Can you also move timestamp generator to use the new rate limiter?
Sure, I've also been considering that. Will move this setting to the constructor (and name the constructor
This is limiting. This will, for instance, disallow for executing
The idea I had with the macros there was to make rate-limited warnings maximally welcoming, so that there's minimum effort and code bloat involved when employing them. Keep in mind that the way macro encapsulates the static instance of the
I believe I can. Will try. |
This can be solved by having also an async version of this method. What are the other limitations? |
In case of DefaultPolicy I don't see any reason to have a separate limiters for those 4 warnings. Each limiter is another global object that is always present. Macros also hide this overhead. |
These warnings are different. We do want the user to get each of them, not only one. If we rate limit them together, then if multiple of them are triggered in close time, only the first one is printed. |
WDYM by the async version? This seems bloaty. Why not just give users a flexible API that lets them utilize the |
This overhead is justified: each warning string is, similarly, another static object that is always present. As the (I assume you mean the memory) overhead is tiny and static, I believe it's not anything to bikeshed about. |
Because it introduces macros for something that can be done without them, without much more verbosity. |
It is extremely unlikely for multiple of those warnings to be triggered, I don't see much sense in caring about this case. |
What concerns me is that the driver performs these checks on every request, even though they’re usually unnecessary. |
You seem to mix two things. I'm arguing about exposing a boolean from the |
I'm actually arguing against both. |
I can agree with this reasoning, but I still believe that there's no generic enough way to represent "an action" to create such API with just one method. |
I see it much differently. If I just wanted to avoid writing the condition, I would employ a function, not a macro. It's exactly declaring the exclusive and encapsulated static object at the call site that I need the macro for. I've learned this pattern in TockOS, more specifically in board components. See this for reference. Such macros are the only way to provide the following combination of guarantees:
|
I really don't see the problem with having 2 - try_execute and try_execute_async. |
I don't see why do you need such strict guarantees for this use case. Manually written static will be just as good here. It is not some kernel code where such security level matters. |
I also don't understand how macro is better than a scope: {
static TEST: i32 = 42;
let x = TEST;
}
let y = TEST; // doesn't work If some code really needs to guarantee that the static won't be accessed outside, then it can introduce a scope. There is no way to access the static without editing code in the scope. |
I understand your concerns. However, I think these checks should be cheap. @Lorak-mmk do you agree? |
I think so. It is mostly simple comparisons. The most expensive part is querying a hashmap, which I don't think will be too slow here. If it turns out to be too slow, we can think of some workarounds (for example, performing whole logic of pick, and only performing the checks if it returned None, or a node from non-preferred DC). |
Ref: #1371
What's done
DefaultPolicy
now issues respective warnings:HostFilter
.This helps avoid confusion when the user expects requests to be sent to a specific datacenter, but it is not available.
Note
As warnings are going to be emitted on every request (because the configuration is most likely session-wide), the messages will quickly flood the logs and the user should notice.
Alternatively, we could implement some frequency limiting of those warnings, but we prefer to:
Pre-review checklist
[ ] I have provided docstrings for the public items that I want to introduce.[ ] I have adjusted the documentation in./docs/source/
.[ ] I added appropriateFixes:
annotations to PR description.