-
Notifications
You must be signed in to change notification settings - Fork 2k
fix(gateway): Prevent inoperable state on initial failure to load configuration #4277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure why I had marked this as a TODO and future breaking change. This change simplifies the `updateComposition` API and makes `load` (called once ever) responsible for setting config.
This commit introduces behavior such that the gateway will `process.exit(1)` in the event that it isn't polling and fails to compose a valid schema on startup.
(not doing a full review unless requested, just curious) Does this state integrate with the Apollo Server built in health check at all (by default or opt in)? It really seems like it would be nice to be able to not consider a gateway server healthy until it has loaded its schema. |
@glasser We aren't changing the existing default behavior of the AS server health check, but we can provide a pre-written function that implements it. That would be opt-in via documentation suggestion for now, and then a default mode of operation in AS3. |
Great, I think we definitely want that for our own use. Let me know if that's something your team is working on or if we should write it. |
@glasser Thoughts on this? |
That sounds good though I was actually thinking of just starting with the much simpler "if we have never successfully loaded the schema, unhealthy, otherwise healthy" (which honestly seems reasonable to me to be on by default, but I trust that your generally more conservative nature around these sorts of changes is based on far more experience with this project than mine). Ie base it on "can we even conceivably run any GraphQL query", not "at the moment are backends happy". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this is WIP, but I thought I'd leave some thoughts about it since from what I understand, this has fixed the Gateway startup problem we've seen internally when there is downstream service unavailability! Put another way, I'm excited to ship it!
let logger: Logger; | ||
|
||
beforeEach(() => { | ||
const warn = jest.fn(); | ||
const debug = jest.fn(); | ||
const error = jest.fn(); | ||
const info = jest.fn(); | ||
|
||
logger = { | ||
warn, | ||
debug, | ||
error, | ||
info, | ||
}; | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also been repeating this pattern in many places. Perhaps at some point soon (not now), we should just make a spyableLogger
?
packages/apollo-gateway/src/__tests__/integration/networkRequests.test.ts
Show resolved
Hide resolved
packages/apollo-gateway/src/__tests__/integration/networkRequests.test.ts
Show resolved
Hide resolved
This reverts commit 34115ee.
|
||
if (isManagedConfig(this.config) || this.experimental_pollInterval) { | ||
if (!this.pollingTimer) this.pollServices(); | ||
await this.updateComposition(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this PR accomplishes its goal. If this call throws, no polling will happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that even once this is fixed, there are a couple issues to resolve:
- We need special logic to ensure that the serverWillStart plugins get called, if they got skipped because the original schemaDerivedData threw
- The
Gateway successfully loaded schema
message inload
won’t ever show up if the first update failed. It might make sense to move that log line intoupdateComposition
fromload
, eg putting it between assigning tothis.schema
and notifying the listeners, if!previousSchema
…figuration (apollographql/apollo-server#4277) In managed mode, kick off polling and continue to try to load config on the poll interval even if it fails the first time. Apollo-Orig-Commit-AS: apollographql/apollo-server@16b7884
In its current state, if the gateway fails to compose successfully in managed mode, it will continue to exist in a state where polling never kicked off and it can't serve requests successfully.
After these changes, the gateway will:
process.exit()
if it fails to load.