-
Notifications
You must be signed in to change notification settings - Fork 133
Expose request coordinator #1287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
7a45481
to
c2860f5
Compare
Why did you introduce new struct ( |
|
Just |
Is there really an API in cpp-driver to retrieve a shard that the request was sent to? |
No, I might have phrased it wrong. Now that I think of it, I did, because there are no shards in Cassandra. What I meant is just that the request coordinator is identified via ip + port ( |
Than I don't understand why |
We can of course consider wrapping |
But the |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
|
c885a61
to
e72ed77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
scylla/src/client/session.rs
Outdated
pub struct Coordinator { | ||
/// Translated address, i.e., one that the connection is opened against. | ||
connection_address: SocketAddr, | ||
/// Untranslated address, i.e., one that the node broadcast. | ||
node_address: SocketAddr, | ||
/// Unique ID of the node in the cluster. | ||
host_id: Uuid, | ||
/// Number of the shard, if applicable (present for ScyllaDB nodes, absent for Cassandra). | ||
shard: Option<Shard>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit: session: introduce Coordinator
- @muzarski - please make sure that
Coordinator
's API meets the needs
of cpp-rust-driver.
I think this is more than enough. The main use case of cass_future_coordinator
is something like
// CassNode would be a wrapper over `Coordinator`
CassNode *n = cass_future_coordinator(cass_session_execute(...));
// This would enforce `SingleTargetLoadBalancingPolicy` using the host_id/node_address
// (and most probably the shard).
cass_statement_set_node(some_statement, n);
The API you proposed, along with SingleTargetLoadBalancingPolicy
is more than enough to implement this on cpp-rust-driver side.
scylla/src/client/session.rs
Outdated
/// The mock coordinator that is used wherever a [Coordinator] is needed but possibly unknown | ||
/// (e.g., on a control connection before the metadata is fetched), | ||
/// including non-[Session] APIs (`MetadataReader::query_metadata()`, `Connection::query_iter()`, etc.). | ||
/// | ||
/// Care should be taken not to leak this [Coordinator] to any user-facing API. | ||
pub(crate) const MOCK_COORDINATOR: Coordinator = Coordinator { | ||
host_id: Uuid::from_u128(0x_feed_dead_deaf_deed_cafe), | ||
connection_address: SocketAddr::new(std::net::IpAddr::V6(std::net::Ipv6Addr::LOCALHOST), 42), | ||
node_address: SocketAddr::new(std::net::IpAddr::V4(std::net::Ipv4Addr::LOCALHOST), 2137), | ||
shard: Some(2137), | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit: codewide: introduce Coordinator to QueryResult
Ok, so I believe it's either this or holding Coordinator
under Option<>
in QueryResult
. TBH, I don't have a strong opinion on this. Both have its pros and cons. Maybe I'm leaning slightly towards Option<Coordinator>
, as it is not as bug prone as MOCK_COORDINATOR
solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to stay by MOCK_COORDINATOR
solution, I'd at least 0-initialize the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I though that using some crazy values will bring up our (or someone else's) attention quicker should anything go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you're right in that zero IP address and zero host ID is obviously not a valid state. Maybe let's stay with 2137
shard (or u32::MAX
), though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I believe it's either this or holding
Coordinator
underOption<>
inQueryResult
. TBH, I don't have a strong opinion on this. Both have its pros and cons. Maybe I'm leaning slightly towardsOption<Coordinator>
, as it is not as bug prone asMOCK_COORDINATOR
solution.
If we put Option<Coordinator>
into QueryResult
, shall we return Option<Coordinator>
from QueryResult::request_coordinator()
or rather panic on None
?
What my point was, was that user should never be able to observe a None
Coordinator
. This is because the only requests whose Coordinator
is unknown are driver-private, and their results are not exposed to the user. Thus the user should not be told to even consider the None
case. Out of the two options:
- to assert the correct behaviour in runtime by panicking at invalid use point (
request_coordinator()
is called on aQueryResult
without a knownCoordinator
, or - to hide the type-level possibility of incorrect behaviour by introducing a mock. This let us pretend on the
QueryResult
level that there's no hazard - because I thought there would be no benefit from such visual threat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you're right in that zero IP address and zero host ID is obviously not a valid state. Maybe let's stay with
2137
shard (oru32::MAX
), though.
u32::MAX
sounds good
e72ed77
to
7e9f696
Compare
Solved the problem around lack of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit: codewide: introduce Coordinator to QueryResult
One possible solution (which we should definitely not pursue now, maybe in distant future it this becomes a problem) is to not use QueryResult internally, and instead have some InternalQueryResult that does not contains Coordinator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also imagined that. However, the amount of boilerplate that this would require scared me too much to consider it further.
fn into_query_result_and_paging_state_with_maybe_unknown_coordinator( | ||
self, | ||
request_coordinator: Option<Coordinator>, | ||
) -> Result<(QueryResult, PagingStateResponse), RequestAttemptError> { | ||
let (raw_rows, paging_state_response) = match self.response { | ||
let Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate names that are this long, but I don't see a better solution.
It was a leftover from the deserialization refactor.
A number of Connection's private API methods: - `query_single_page()`, - `query_single_page_with_consistency()`, - `execute_unpaged()`, - `batch()`, were not used at all and were decorated with `#[allow(dead_code)]`. The funny part is that it was I who decorated them as such. I decided it's time to say goodbye to them. There's no use in keeping them, especially that internal refactors often require adjusting them, too (which was the case in this PR). Additionally, `query_raw_unpaged()` and `execute_raw_unpaged()` had their visibility reduced to `pub(self)`, and `execute_raw_unpaged()` was made `#[cfg(test)]`. These measures decrease risk of misusing those APIs from outside `Connection`. One exemplary hazard that justifies this action is that those functions ignore execution profiles.
`QueryResponse` -> `QueryResult` can be performed by calling `query_response.into_non_error_query_response().into_query_result()`, so there's no need for direct `into_query_result()` conversion method on `QueryResponse`. It's removed for easier refactor in further commits.
This is a tiny adjustment to only call `Connection::get_connect_address` once, not unnecessarily twice.
`Session` started to have two separate `impl` blocks after deserialization refactor, as a legacy. Those blocks are now merged.
2ad215a
to
19beb7a
Compare
Rebased on main. |
`Coordinator` represents the coordinator of a CQL request. It's intended to be built by `Session` and `QueryPager` upon requests issued to a given target (node+shard), and exposed in both `QueryResult` and `QueryPager`.
In order to keep structs that are going to contain Coordinator implement auto traits (in our case, UnwindSafe and RefUnwindSafe are important), we need to have all their substructs implement those traits, too. Otherwise, we would break the public API by implicitly removing implementation of those traits from structs that are going to hold Coordinator. Coordinator contains Node, which contains NodeConnectionPool, which in turn contains tokio::mpsc::Sender. This implements UnwindSafe and RefUnwindSafe only since tokio 1.40, so we bump tokio dependency.
When two years ago I added the `endpoint` field to `NodeConnectionPool`, I forgot to include it in its custom `Debug` impl. Now it's fixed.
These implementations are a temporary solution to the following problem: `QueryResult` used to implement `(Ref)UnwindSafe`, but then we wanted it to store a reference to `Node`. This, transitively, made it store `NodeConnectionPool`, which did not implement those traits. Thus, they would no longer be auto-implemented for `QueryResult`, breaking the public API. Not to introduce an API breakage in a minor release, we decided to manually hint that `NodeConnectionPool` is indeed unwind-safe. Even if our we are wrong and the hint is misleading, the documentation of those traits, not being `unsafe` traits, considers them merely guidelines, not strong guarantees.
`QueryResult` now contains `Option<Coordinator>`. For this, `NonErrorQueryResponse::into_query_result()` now accepts `Coordinator`. It is constructed in `Session::run_request()` for each target from `Plan` that is tried. Note that `QueryResult::request_coordinator()` return `&Coordinator` even though `QueryResult` may possibly hold `None`. If `None` is there, the driver panics upon the getter called. Rationale: some private `Connection` APIs (namely, `query_raw_unpaged()`) return `QueryResult` even though they don't have access to `Node`, which is required to construct a `Coordinator`. Even if we redesigned this particular API to be able to create a `Coordinator`, we would still be left with the fundamental chicken-and-egg problem that occurs when we use the `Connection` APIs on the control connection to issue the initial metadata fetch. In such case we have 0 chance to have the `Node` accessible (as it's only constructed based on the first metadata fetch). To solve the problem, those APIs construct `QueryResult` using a dedicated constructor: `new_with_unknown_coordinator()`. It should be used only in APIs that do not leak `QueryResult` to the user, so the panic is not triggered.
Similarly to `QueryResult` (and `QueryRowsResult`), `QueryPager` (as well as `TypedRowStream`) stores coordinators of CQL requests it performed. In the special case of `Connection::{query,execute}_iter()`, Coordinator is unknown. Fortunately, we can simply leave the `Vec<Coordinator>` empty, returning empty Iterator when asked.
`make static` didn't check the code when built with `cpp_rust_unstable` cfg parameter.
19beb7a
to
b5a744e
Compare
v1.3:
|
This is the second approach to #1030.
Coordinator
Coordinator
is defined as follows:It has
pub
getters for all of its fields.Coordinator
is built fromNodeRef
(untranslated_address
andhost_id
),Shard
(shard
), andConnection
(connection_address
).Use
Coordinator
is stored inQueryResult
(as well asQueryRowsResult
) andQueryPager
(together withTypedRowStream
).QueryPager
exposes an iterator of references toCoordinator
s, each of which served one page.Testing
No tests yet.
Fixes: #1030
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.