Skip to content

Validate TaskList partition updates via CLI are safe #6682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

natemort
Copy link
Member

Add additional validation to confirm that the proposed update is safe. Provide a "force" flag to skip this additional validation.

This covers three main potential errors:

  • Prevent updating the wrong TaskList by checking whether it has pollers
  • Prevent removing currently active write partitions as they could have tasks. These have to be removed in a two-step process where they're removed from writing and then from reading.
  • Prevent removing read partitions that still have tasks. These need to stay as read partitions until pollers have taken all the tasks (or the tasks have timed out).

What changed?

  • Add additional validation to the recently added admin tasklist up command before applying the update.

Why?

  • Ensure operators don't accidentally update the wrong tasklist or perform an unsafe action.

How did you test it?

  • Unit tests

Potential risks

  • Immediately after a matching host is assigned a TaskList it might have no pollers and would flag updates as being unsafe. These false positives seem like a reasonable compromise.

Release notes

Documentation Changes

return fmt.Errorf("DescribeTaskList failed: %w", err)
}
if len(description.Pollers) == 0 {
return fmt.Errorf("'%s' has no pollers of type '%s'", tl.Name, tlt.String())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this check should be just warning or move it to the end. Operator seeing this error might just rerun with -f without knowing about the dangerous stuff checked below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out, I'm going to move this check to the end

Copy link
Member

@taylanisikdemir taylanisikdemir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding these safety checks!

Add additional validation to confirm that the proposed update is safe. Provide a "force" flag to skip this additional validation.

This covers three main potential errors:
- Prevent updating the wrong TaskList by checking whether it has pollers
- Prevent removing currently active write partitions as they could have tasks. These have to be removed in a two-step process where they're removed from writing and then from reading.
- Prevent removing read partitions that still have tasks. These need to stay as read partitions until pollers have taken all the tasks (or the tasks have timed out).
@natemort natemort merged commit d646252 into cadence-workflow:master Feb 21, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants