Skip to content

[domain-deletion]Introduce a feature flag to control domain deletion #6920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gazi-yestemirova
Copy link
Contributor

What changed?
This PR introduces a new feature flag,EnableDomainDeletion, to control access to the domain deletion API.
This flag now gates the DeleteDomain API. If the flag is disabled, domain deletion requests will be rejected.
The flag defaults to false to prevent accidental domain deletions during rollout.

Why?
The primary goal is to make the domain deletion feature backward-compatible. We will enable the flag after the gradual rollout of the domain deletion API across all clusters.

How did you test it?
Unit tests & local testing

Potential risks

Release notes

Documentation Changes

@@ -154,6 +154,11 @@ func (wh *WorkflowHandler) DeleteDomain(ctx context.Context, deleteRequest *type
if err := wh.requestValidator.ValidateDeleteDomainRequest(ctx, deleteRequest); err != nil {
return err
}

if !wh.config.EnableDomainDeletion() {
return &types.BadRequestError{Message: "Domain deletion is not enabled."}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd change this to "Domain deletion feature is not enabled" so it is clear it is not something not enabled for the particular domain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time what could be wrong with using this feature?
Is this allowed for a random persons?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related to the previous PR and the discussion #6918 (review)
The problem is that we are introducing a new replication task requiring cross-cluster coordination.
@gazi-yestemirova Correct me if I'm wrong :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is correct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this to the PR description and point to the issue with replication tasks so it will be easier to understand why we need a feature flag in this case?

@3vilhamster
Copy link
Contributor

Could you elaborate on the backward-compatibility of the feature?
As far as I understand, we've just introduced a new feature, so why should it be guarded by the feature flag?

@gazi-yestemirova
Copy link
Contributor Author

Could you elaborate on the backward-compatibility of the feature? As far as I understand, we've just introduced a new feature, so why should it be guarded by the feature flag?

When we delete a domain, if it is a global domain, we trigger a replication task to delete the domain in a replica cluster.
I recently introduced IDL changes in the replicator, specifically DomainOperationDelete to be able to execute delete in a replica cluster.

Sometimes, it might happen so that Cluster A has these IDL changes but Cluster B is not deployed yet (or even when the rollbacks happen). Cluster A starts generating delete tasks, which get replicated to cluster B. When cluster B doesn't recognize it and falls into the default case and the task will be discarded. Nothing will crash, but we will end up with inconsistent data across active - passive clusters.

So this feature flag will be enabled when we have deployed the API changes and removed after the feature is rolled out across all the production environments.

@dkrotx
Copy link
Member

dkrotx commented May 15, 2025

So this feature flag will be enabled when we have deployed the API changes and removed after the feature is rolled out across all the production environments.

Does that mean you will deprecate this flag afterwards?

@gazi-yestemirova
Copy link
Contributor Author

So this feature flag will be enabled when we have deployed the API changes and removed after the feature is rolled out across all the production environments.

Does that mean you will deprecate this flag afterwards?

Yes, it will be removed. After I roll out the domain deletion feature to all the waves

@gazi-yestemirova gazi-yestemirova merged commit f8db816 into cadence-workflow:master May 16, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants