Skip to content

[service][fx] Move a few more components to fx and make shard distributor service have it's own dependencies #6859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 30, 2025

Conversation

3vilhamster
Copy link
Contributor

@3vilhamster 3vilhamster commented Apr 25, 2025

What changed?
Onboarded server to the FX application, and now all services started as a single server are running in parallel as a separate fx.Applications. Also, onboarded shard distributor to fx instead of relying on resource god object for component dependencies. The old way of running things is still supported, but could be cleaned up later.
Few components that are required for sharddistributor pushed to fx.Module:

  1. clock.Timesource
  2. ringpop.Provider
  3. membership.Rings and Resolver
    These components build and inject themselves into the application lifecycle without a Resource god object.

Notice: all other services still use legacy logic and dependency injection, and sharddistributor exposes the legacy constructor for backward compatibility.

Why?
It allows services and components to define dependencies dynamically.

How did you test it?
Unit tests/local runs/integration tests

Potential risks
Shard distributor can fail to start or fail on start.

Release notes

Documentation Changes

@3vilhamster 3vilhamster changed the title sharddistributorfx [service][fx] Move a few more components to fx and make shard distributor service have it's own dependencies Apr 25, 2025
Copy link
Member

@jakobht jakobht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, left some comments

Comment on lines 197 to 205
stoppedWg.Wait()
// After stoppedWg unblocked all services are stopped to we no longer wait for errors.
close(errChan)
var resErrors error
for err := range errChan {
if err != nil {
resErrors = multierr.Append(resErrors, err)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly we still only return errors when all services has complete? Can we maybe select on stoppedWg and errChan, so we can print errors immediately.
I imaging a case where frontend does not start but the rest of the system starts, then there will not be any error messages indicating this before the whole system is shut down.

I think it would be very nice to keep the errors in a list and then re-print all of them in the end so it's easier to debug, but I think it's important they are also printed when the error happens.

Copy link
Contributor Author

@3vilhamster 3vilhamster Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fx prints failures before stopping and provides breadcrumbs for which OnStart failed or which services failed to start due to missing dependencies.
Concurrency is not obvious here, since we need to stop errChan. I've made a best effort for now:

  1. Cancel the start context as soon as any services fail to start. That does not guarantee the order or operations, since Start can take longer, and all services may run at the time of failure.
  2. I remove context.Canceled error to have a list of failures.
  3. I move stoppedWg.Wait() and close(errChan) to a goroutine.

But it is hard to test this exact behavior. I've tested a failure, though.

@3vilhamster 3vilhamster merged commit 9587808 into cadence-workflow:master Apr 30, 2025
23 checks passed
@3vilhamster 3vilhamster deleted the sharddistributorfx branch April 30, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants