Skip to content

[domain-deletion]Add handler to process delete domain replication task #6918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gazi-yestemirova
Copy link
Contributor

@gazi-yestemirova gazi-yestemirova commented May 14, 2025

What changed?
This PR introduces the handleDomainDeleteReplicationTaskhandler. It is responsible for processing replication tasks specifically related to domain deletions.
Related IDL changes: cadence-workflow/cadence-idl#200
cadence-workflow/cadence-idl#201

Why?
Currently, domain deletions are not consistently propagated across all replicas, leading to potential inconsistencies. This task handler ensures that domain delete operations are reliably replicated.

How did you test it?
Unit tests & local testing

Potential risks

Release notes

Documentation Changes

@timl3136
Copy link
Member

This is also an IDL dependency update, can you link related the IDL commit as well?

@gazi-yestemirova
Copy link
Contributor Author

This is also an IDL dependency update, can you link related the IDL commit as well?

sure, here are the links: cadence-workflow/cadence-idl#200
cadence-workflow/cadence-idl#201

I will update the summary as well

getResponse.PreviousFailoverVersion,
isGlobalDomain,
); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap error with additional information warning that domain in replicas was not deleted correctly.

Copy link
Contributor Author

@gazi-yestemirova gazi-yestemirova May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, good point! Updated the PR

@@ -99,6 +100,8 @@ func (h *domainReplicationTaskExecutorImpl) Execute(task *types.DomainTaskAttrib
return h.handleDomainCreationReplicationTask(ctx, task)
case types.DomainOperationUpdate:
return h.handleDomainUpdateReplicationTask(ctx, task)
case types.DomainOperationDelete:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a forward compatibility issue we should worry about?
e.g. Cluster A has this change but cluster B is not deployed yet. Cluster A starts generating DELETE tasks which get replicated to cluster B. Cluster B doesn't recognize it and falls into default case below. In that case does the task get discarded, tried forever or put into dlq?

If such cases would cause hard-to-recover situations, let's split idl changes and its usage into separate commits/prereleases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! Thank you!
To address these concerns, I've introduced a feature flag that will be enabled when the changes have been rolled out in active and replica clusters as well. Here is the link - #6920
Please let me know what you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great way to safely rollout this change. Thanks

@gazi-yestemirova gazi-yestemirova merged commit 86d33f5 into cadence-workflow:master May 16, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants