-
Notifications
You must be signed in to change notification settings - Fork 838
[service][fx] Move a few more components to fx and make shard distributor service have it's own dependencies #6859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[service][fx] Move a few more components to fx and make shard distributor service have it's own dependencies #6859
Conversation
…utor service have it's own dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, left some comments
cmd/server/cadence/cadence.go
Outdated
stoppedWg.Wait() | ||
// After stoppedWg unblocked all services are stopped to we no longer wait for errors. | ||
close(errChan) | ||
var resErrors error | ||
for err := range errChan { | ||
if err != nil { | ||
resErrors = multierr.Append(resErrors, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly we still only return errors when all services has complete? Can we maybe select on stoppedWg and errChan, so we can print errors immediately.
I imaging a case where frontend does not start but the rest of the system starts, then there will not be any error messages indicating this before the whole system is shut down.
I think it would be very nice to keep the errors in a list and then re-print all of them in the end so it's easier to debug, but I think it's important they are also printed when the error happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fx prints failures before stopping and provides breadcrumbs for which OnStart failed or which services failed to start due to missing dependencies.
Concurrency is not obvious here, since we need to stop errChan. I've made a best effort for now:
- Cancel the start context as soon as any services fail to start. That does not guarantee the order or operations, since Start can take longer, and all services may run at the time of failure.
- I remove context.Canceled error to have a list of failures.
- I move stoppedWg.Wait() and close(errChan) to a goroutine.
But it is hard to test this exact behavior. I've tested a failure, though.
What changed?
Onboarded server to the FX application, and now all services started as a single server are running in parallel as a separate fx.Applications. Also, onboarded shard distributor to fx instead of relying on resource god object for component dependencies. The old way of running things is still supported, but could be cleaned up later.
Few components that are required for sharddistributor pushed to fx.Module:
These components build and inject themselves into the application lifecycle without a Resource god object.
Notice: all other services still use legacy logic and dependency injection, and sharddistributor exposes the legacy constructor for backward compatibility.
Why?
It allows services and components to define dependencies dynamically.
How did you test it?
Unit tests/local runs/integration tests
Potential risks
Shard distributor can fail to start or fail on start.
Release notes
Documentation Changes