-
Notifications
You must be signed in to change notification settings - Fork 819
Apply WaitStabilityMinDuration when syncing blocks #5406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply WaitStabilityMinDuration when syncing blocks #5406
Conversation
a9c5f3c
to
3f0f587
Compare
b605588
to
e234318
Compare
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
Signed-off-by: Justin Jung <[email protected]>
e234318
to
8d283d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this LGTM!
Signed-off-by: Justin Jung <[email protected]>
ed87699
to
aa909d7
Compare
Please hold on, I'm verifying whether my code change will make no block to be available when querier or ruler is making a call |
pkg/storegateway/gateway.go
Outdated
@@ -370,6 +372,10 @@ func (g *StoreGateway) waitRingStability(ctx context.Context, reason string) { | |||
minWaiting := g.gatewayCfg.ShardingRing.WaitStabilityMinDuration | |||
maxWaiting := g.gatewayCfg.ShardingRing.WaitStabilityMaxDuration | |||
|
|||
if !g.gatewayCfg.ShardingEnabled || minWaiting <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it duplicate of shouldWaitRingStability
? I think we can keep one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops no that's supposed to be removed, I'll delete it
Here's my concern:
I'll verify what SGs will be returned to the querier when it tries to read B1 at T2 and T4, and add a test case if possible. |
Signed-off-by: Justin Jung <[email protected]>
It doesn't work like this. If only two new store-gateways are added to the ring the block will still be assigned to at least one of the three store-gateways that held the block because of consistent hashing. Eg: SG2, SG11, SG12 Because queriers retry three times, they will eventually be able to query the store-gateway that has the block. This is why we are scaling store-gateways one by one to keep ring disruptions to a minimum. |
Closing this PR in favor of using
|
What this PR does:
Currently the
WaitStabilityMinDuration
andWaitStabilityMaxDuration
are only used for new store gateway instance joining the ring, to make sure the ring info is propagated to all members of the ring before doing initial sync.But this config was not used to existing members of the ring -- for example, when two new members are joining within a split second, there is a possibility of an existing member starts syncing block with only one member added to its ring (since it simply polls for any ring change every 5 second without any stability check). This is not desired when we are scaling up/down more than one store-gateway at once, or when we are doing rollout deployment.
This PR makes the periodic and ring-change block sync to wait for ring token and zone stability, before kicking off block sync.
Which issue(s) this PR fixes:
n/a
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]