Skip to content

add ability to disable deadline propagation #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 3, 2025

Conversation

bjlaub
Copy link
Contributor

@bjlaub bjlaub commented Mar 3, 2025

Adds a method to disable further propagation of deadline values via the Deadlines.encodeToRequest method. The use case for this is to prevent RPC call stacks from continuing to send Expect-Within: 0 after a deadline has expired, which can lead to quite a bit of noise in reported metrics and logging.

Callers can use Deadlines.disableFurtherDeadlinePropagation() to instruct the library to ignore the internal deadline state on further calls to encodeToRequest. Future calls to Deadlines.getRemainingDeadline() will return Optional.empty().

Callers are expected to use this only when they can ensure that further operations should not be subject to deadline enforcement. A primary use case for this is for requests that spin off background work which may continue to make requests, even when the original request handler has terminated. In those instances, callers may wish to stop enforcing the deadline as it is quite possible a background task will make a request well into the future after the original deadline is expired, and the semantics of that are somewhat of a judgement call. Adding this flag will allow callers to control the fate of such future requests with respect to deadline enforcement.

This change adds the following:

  • the Deadlines.disableFurtherDeadlinePropagation() method, which alters the internal state of the current deadline within the TraceLocal, setting a flag to disable propagation
  • changes the internal state structures to track deadline propagation via a boolean
  • adds an intent tag to the deadline expiration meter metric reported via this library; this tag will be set to ignore in scenarios where encodeToRequest was called after disableFurtherDeadlinePropagation was called, and propagate otherwise

==COMMIT_MSG==
add ability to disable deadline propagation
==COMMIT_MSG==

@bjlaub bjlaub requested a review from carterkozak March 3, 2025 17:36
@changelog-app
Copy link

changelog-app bot commented Mar 3, 2025

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

add ability to disable deadline propagation

Check the box to generate changelog(s)

  • Generate changelog entry

deadlineState.set(providedDeadline);
}

private static void checkExpiration(long deadline, boolean internal) {
private static void checkExpiration(long deadline, boolean internal, boolean _disablePropagation) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we encode disablePropagation as a separate tag on the meter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be helpful

* will result in a no-op assuming a deadline has previously be set for this trace (e.g. via a previous call to
* {@link #parseFromRequest}).
*/
public static void disableFurtherDeadlinePropagation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Optional<Duration> getRemainingDeadline() return an empty optional after this is called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of metrics, I think this should be equivalent to deadlineState.remove()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure. disableFurtherDeadlinePropagation doesn't mean that the deadline no longer exists in the current TraceLocal state, just that we won't add the header on future calls to encodeToRequest. maybe there is a valid use case for checking the deadline value even after we have disabled propagation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of metrics, I think this should be equivalent to deadlineState.remove()

i think that's perhaps fair, though it narrows some possibilities quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the implementation to return Optional.empty() in this case.

@@ -18,4 +18,12 @@ namespaces:
docs: A deadline expiration was caused due to the inability to meet an
externally provided deadline, such as a server being unable to
complete required work before a client-provided deadline elapses.
- name: propagationDisabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps something like action or intent with values along the lines of [propagate, ignore]? I often find boolean metric tags a bit harder to understand, and they don't leave much opportunity to add additional states if we decide to roll out a feature flag later on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, i'll rephrase. it's a little nuanced as i don't want to convey that the intent was to ignore enforcement of a deadline expiration originally, just that it should be ignored from a certain point onwards (and, the meter marked with this tag will only be marked after that point and so the intent is a little ambiguous).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjlaub bjlaub changed the title WIP disable further propagation flag add ability to disable deadline propagation Mar 3, 2025
carterkozak
carterkozak previously approved these changes Mar 3, 2025
@policy-bot policy-bot bot dismissed carterkozak’s stale review March 3, 2025 20:17

Invalidated by push of 3de80d8

@carterkozak
Copy link
Contributor

👍

@bulldozer-bot bulldozer-bot bot merged commit 9b2d1d3 into develop Mar 3, 2025
5 checks passed
@bulldozer-bot bulldozer-bot bot deleted the blaub/disable-propagation branch March 3, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants