Skip to content

adapter: move timestamp oracle impl to it's own module, put behind trait object #22112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

aljoscha
Copy link
Contributor

@aljoscha aljoscha commented Oct 2, 2023

Builds on #22091, which is not yet merged, but I wanted to keep the momentum going.

Part of MaterializeInc/database-issues#6635

Tips for reviewer

#21671 has this commit and all the follow-up commits for full MaterializeInc/database-issues#6635, it might be good to look at that for context.

Commits have good messages that explain the rationale. But the gist is that this prepares us for introducing the PostgresTimestampOracle behind that trait, next to the existing oracle. And the code that uses it doesn't have to know.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@aljoscha aljoscha requested a review from a team as a code owner October 2, 2023 16:52
@aljoscha aljoscha added the T-platform-v2 Theme: Platform v2 label Oct 2, 2023
@shepherdlybot
Copy link

shepherdlybot bot commented Oct 2, 2023

This PR has moderate risk. Make sure to carefully review the file hotspots. It's still a good idea to have this change reviewed, ensuring enough test coverage, and observability. What's This?

Risk Score Probability Buggy File Hotspots
🟡 73 / 100 51% 2
Buggy File Hotspots:
File Percentile
../src/coord.rs 99
../src/catalog.rs 100

///
/// All subsequent values of `self.read_ts()` will be greater or equal to
/// `write_ts`.
async fn apply_write(&mut self, lower_bound: T);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should lower_bound be write_ts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Comment on lines +257 to +294
/// Returns the current system time while protecting against backwards time
/// jumps.
///
/// The caller is responsible for providing the previously recorded system time
/// via the `previous_now` parameter.
///
/// If `previous_now` is more than `TIMESTAMP_INTERVAL_UPPER_BOUND *
/// TIMESTAMP_PERSIST_INTERVAL` milliseconds ahead of the current system time
/// (i.e., due to a backwards time jump), this function will block until the
/// system time advances.
///
/// The returned time is guaranteed to be greater than or equal to
/// `previous_now`.
// TODO(aljoscha): These internal details of the oracle are leaking through to
// multiple places in the coordinator.
pub(crate) fn monotonic_now(now: NowFn, previous_now: mz_repr::Timestamp) -> mz_repr::Timestamp {
let mut now_ts = now();
let monotonic_now = cmp::max(previous_now, now_ts.into());
let mut upper_bound = catalog_oracle::upper_bound(&mz_repr::Timestamp::from(now_ts));
while monotonic_now > upper_bound {
// Cap retry time to 1s. In cases where the system clock has retreated
// by some large amount of time, this prevents against then waiting for
// that large amount of time in case the system clock then advances back
// to near what it was.
let remaining_ms = cmp::min(monotonic_now.saturating_sub(upper_bound), 1_000.into());
error!(
"Coordinator tried to start with initial timestamp of \
{monotonic_now}, which is more than \
{TIMESTAMP_INTERVAL_UPPER_BOUND} intervals of size {} larger than \
now, {now_ts}. Sleeping for {remaining_ms} ms.",
*TIMESTAMP_PERSIST_INTERVAL
);
thread::sleep(Duration::from_millis(remaining_ms.into()));
now_ts = now();
upper_bound = catalog_oracle::upper_bound(&mz_repr::Timestamp::from(now_ts));
}
monotonic_now
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only really makes sense when used with the EpochMillis timeline. That timeline isn't mentioned anywhere in this module and feels like it's an implementation detail of other parts of the adapter. For those reasons it feels a little out of place in this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I also didn't know what to do about this one. In the end I moved it here with the oracle code because it uses internals of the oracle like upper_bound and the TIMESTAMP_PERSIST_INTERVAL and TIMESTAMP_INTERVAL_UPPER_BOUND. Plus, once we remove this oracle we have more of the code that we need to remove localized in this one file.

But I'm happy to move that back to timeline.rs. And I probably have to add more code here that does a monotonic_now for both oracles, in the interim period where we have both oracles/where we switch over. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to leave here if we plan on removing this later.

@nrainer-materialize
Copy link
Contributor

Coverage looks good here as well. 👍

@aljoscha aljoscha force-pushed the adapter-ts-oracle-trait-object branch 2 times, most recently from 501e64a to 4cb3a77 Compare October 4, 2023 12:10
It was not used outside that struct (anymore?) and made the code
slightly less readable/introduced one more hop to go through when
looking at code/introduced one additional thing that has to be named.
We need that name free for the future TimestampOracle trait.
We do this because we want to provide a more decoupled trait in future
commits and the persist_fn leaks the underlying persistence/durability
mechanism.

We introduce a TimestampPersistence trait that we give to
DurableTimestamp which encapsulates the persistence functionality.
…ct in usage sites

Further preparation for eventually hooking up a different implementation
of TimestampOracle.
@aljoscha aljoscha force-pushed the adapter-ts-oracle-trait-object branch from 4cb3a77 to 2fc78eb Compare October 4, 2023 18:20
@aljoscha aljoscha merged commit ef91d6b into MaterializeInc:main Oct 4, 2023
@aljoscha aljoscha deleted the adapter-ts-oracle-trait-object branch October 4, 2023 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-platform-v2 Theme: Platform v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants