Skip to content

Get rid of get_blobs_not_found and unify blob errors #2649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

ndr-ds
Copy link
Contributor

@ndr-ds ndr-ds commented Oct 18, 2024

Motivation

We've had these two different error variants for blobs for a while: BlobNotFoundOnRead and BlobsNotFound. The motivation was to differentiate between errors when doing a storage read, and everything else. In the current state of the code, I don't think this distinction is really giving us much, and it complicates the code more.

We also have this get_blobs_not_found function that abstracts away the different nested error types.

Proposal

Remove BlobNotFoundOnRead, and simplify things a bit. Wrote custom/manual From implementations for these different error types, so we can have a BlobsNotFound error everywhere and can get rid of get_blobs_not_found.
Fixes #2628

Test Plan

CI

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

Copy link
Contributor Author

ndr-ds commented Oct 18, 2024

@ndr-ds ndr-ds force-pushed the 10-18-blobnotfoundonread_-_blobsnotfound branch 2 times, most recently from c5174be to b49501f Compare October 18, 2024 20:12
@ndr-ds ndr-ds marked this pull request as ready for review October 18, 2024 21:14
@ndr-ds ndr-ds requested review from afck, jvff and ma2bd October 18, 2024 21:15
#[error("Oracle response mismatch")]
OracleResponseMismatch,
#[error("No recorded response for oracle query")]
MissingOracleResponse,
}

impl SystemExecutionError {
pub fn get_blobs_not_found(&self) -> Option<Vec<BlobId>> {
Copy link
Contributor

@ma2bd ma2bd Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the rest of the code, we try to normalize the errors during conversion instead. In this case, I wonder why we have so many kinds of BlobsNotFound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(There's also #2628 for this.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree this is not the ideal end solution for this, this is just an intermediate step until we get to #2628

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a TODO(#2628) to both get_blobs_not_found functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with @afck a while back, I'm just gonna do #2628 here already

@ndr-ds ndr-ds force-pushed the 10-18-blobnotfoundonread_-_blobsnotfound branch 2 times, most recently from 78912fd to bb341a0 Compare October 23, 2024 13:19
@ndr-ds ndr-ds force-pushed the 10-18-blobnotfoundonread_-_blobsnotfound branch 2 times, most recently from 84566e6 to ca23cb5 Compare November 7, 2024 00:08
@ndr-ds ndr-ds requested review from afck and ma2bd November 7, 2024 01:49
Copy link
Contributor

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. It mainly shows that we have too many error types that are nested in too many ways. But cleaning those up is beyond the scope of this PR, of course.

I'm also a bit concerned about the fact that we now convert BlobsNotFound to the corresponding variant of the higher-level type implicitly everywhere, using ?: Should they really be treated all the same? E.g. aren't there cases where a validator should locally expect to have a particular blob—i.e. should respond with something like an "internal error"—that are different from other places where it's correct to respond with a BlobsNotFound to the client, so they can send them to us?

Anyway, I'm approving this for now because I don't have a simple better idea and I don't think it introduces new problems, but I'd prefer if we could get another opinion on this PR, by a second reviewer.

@ndr-ds ndr-ds force-pushed the 10-18-blobnotfoundonread_-_blobsnotfound branch 2 times, most recently from 0e1d67d to e9a3222 Compare November 8, 2024 19:41
Copy link
Contributor Author

ndr-ds commented Nov 8, 2024

Those are good points. I think right now though, AFAIU, the behavior is that we always return the error to the client, and fail on the client if the blob is actually not found. This is a behavior that could be looked into though, if we think it should be done differently 🤔

error => Self::RemoteNodeError(error),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this is a conversion we should not make, since we will want to distinguish: If the error is that the remote node doesn't have a blob, we need to send it. If the error is that we don't have a blob, we need to request it or fail. (See #2871.)

@afck afck self-requested a review November 13, 2024 09:30
@ndr-ds ndr-ds changed the title BlobNotFoundOnRead -> BlobsNotFound Get rid of get_blobs_not_found and unify blob errors Nov 13, 2024
} else {
Err(LocalNodeError::BlobsNotFound(blob_ids)) => {
let blobs = remote_node.try_download_blobs(blob_ids.as_slice()).await;
if blobs.len() != blob_ids.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird but changed in the next PR, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @afck had a PR that fixed this behavior, so this is already gone in my local version of this PR. Will update it in a bit!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't I already merge that? I think try_download_blobs returns None if it couldn't download all requested blobs? So I think you can remove this comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what I said 😅 already have that done locally, but haven't updated the PR yet

@ndr-ds ndr-ds force-pushed the 10-18-blobnotfoundonread_-_blobsnotfound branch from e9a3222 to b550792 Compare November 13, 2024 19:13
Copy link
Contributor Author

ndr-ds commented Nov 13, 2024

Merge activity

  • Nov 13, 3:58 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Nov 13, 3:58 PM EST: A user merged this pull request with Graphite.

@ndr-ds ndr-ds merged commit 143a051 into main Nov 13, 2024
24 checks passed
@ndr-ds ndr-ds deleted the 10-18-blobnotfoundonread_-_blobsnotfound branch November 13, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit get_blobs_not_found and blob related errors
3 participants