-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Config reload [WIP] #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config reload [WIP] #499
Conversation
Anyone know what's up with the unit tests? |
All builds fail at "TestRouteTLSHandshakeError", you may want to look if your changes have introduced a call to os.Exit() or something that this test may trigger. |
server/log.go
Outdated
@@ -15,9 +15,9 @@ var trace int32 | |||
var debug int32 | |||
|
|||
var log = struct { | |||
sync.Mutex | |||
*sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test is storing the value of log
before the test and replacing it after to prevent side effects, so govet complains because it's copying the mutex value (motivation for #501).
server/server.go
Outdated
@@ -148,6 +147,18 @@ func New(opts *Options) *Server { | |||
return s | |||
} | |||
|
|||
func (s *Server) getOpts() *Options { | |||
s.optsMu.RLock() | |||
defer s.optsMu.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be expensive for when getOpts is in a fast path. Defer is better, but still slow. Just explicitly unlock like setOpts does.
server/signal.go
Outdated
return | ||
} | ||
c := make(chan os.Signal, 1) | ||
|
||
signal.Notify(c, syscall.SIGINT, syscall.SIGUSR1) | ||
signal.Notify(c, syscall.SIGINT, syscall.SIGUSR1, syscall.SIGHUP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add this in until we have programmatic tests via API for all options we care about etc. and good test coverage. We should not expose to end users too soon IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @derekcollison. let's build up the features for supporting it and then add in the reload once we have the critical items (Certs and Auth).
Derek, you agree we can merge all the bits to support in prior to having the signal and making the feature available in the server, correct?
Defer adds a bit of overhead which can affect fast-paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll approve once we have agreement on merge/release strategy and confirmation of symlink working xplatform.
server/reload.go
Outdated
} | ||
switch strings.ToLower(field.Name) { | ||
case "trace": | ||
opts = append(opts, &traceOption{newValue.(bool)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to name opts
something like diff_opts
. took me a second to get why you were appending.
} | ||
processOptions(golden) | ||
|
||
if err := os.Symlink("./configs/reload/test.conf", config); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work as expected on windows? cc @ColinSullivan1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should (https://golang.org/src/os/file_windows.go#L522), though I don't have a Windows box to test on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works on windows so long as you have correct privileges - you'll want to add a note someplace, probably the README.md.
Without the right privileges:
reload_test.go:175: Error creating symlink: symlink .\configs\reload\test.conf g:\src\github.com\nats-io\gnatsd\server\tmp.conf: A required privilege is not held by the client.
FAIL
On a network drive (generally not allowed by default):
--- FAIL: TestConfigReload (0.00s)
reload_test.go:175: Error creating symlink: symlink .\configs\reload\test.conf g:\src\github.com\nats-io\gnatsd\server\tmp.conf: Incorrect function.
FAIL
Locally, with privileges (simple case running on C:
as admin), you'll succeed.
=== RUN TestConfigReload
[3792] [INF] Reloaded server configuration
--- PASS: TestConfigReload (0.00s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColinSullivan1 Thanks for checking that. I can add a note to the README if we think this is acceptable. Since this is just for development purposes, this seems OK to me. What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is OK; nothing unusual (at least for Windows).
server/signal.go
Outdated
return | ||
} | ||
c := make(chan os.Signal, 1) | ||
|
||
signal.Notify(c, syscall.SIGINT, syscall.SIGUSR1) | ||
signal.Notify(c, syscall.SIGINT, syscall.SIGUSR1, syscall.SIGHUP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @derekcollison. let's build up the features for supporting it and then add in the reload once we have the critical items (Certs and Auth).
Derek, you agree we can merge all the bits to support in prior to having the signal and making the feature available in the server, correct?
Do not expose config reloading yet until we are ready.
Regarding a config reload on windows. some other mechanism will be needed (that signal is not really supported). Either a REST API, or something else. We have the same issue with log rotation. |
@ColinSullivan1 Yep, since we're not exposing the signals yet, I think we can defer the decision for what we want to do on Windows for now. That can come at a later time when we're ready to expose the feature. |
Add a note clarifying symlink behavior on Windows for developers.
README.md
Outdated
FAIL | ||
``` | ||
|
||
Similarly, this can fail when creating a symlink on a network drive, which is typically now allowed by default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, "not"
@@ -0,0 +1,99 @@ | |||
package server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a copyright?
// Copyright 2017 Apcera Inc. All rights reserved.
@@ -0,0 +1,210 @@ | |||
package server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright.
processOptions(golden) | ||
|
||
if err := os.Symlink("./configs/reload/test.conf", config); err != nil { | ||
t.Fatalf("Error creating symlink: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on this, but maybe skip the test here (and in others below), with some output? This makes assumptions about the privileges of the user the test is running under, where the test is running, etc. It's convenience vs ensuring a test runs locally. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's better to fail loudly, otherwise the test might get skipped and the user wouldn't notice the warning message. We could add additional information to the Fatal message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go with that... IMO an "ensure you have privileges" would go a long way.
server/server.go
Outdated
@@ -584,7 +596,7 @@ func (s *Server) createClient(conn net.Conn) *client { | |||
} | |||
// If there is a max connections specified, check that adding | |||
// this new client would not push us over the max | |||
if s.opts.MaxConn > 0 && len(s.clients) >= s.opts.MaxConn { | |||
if s.getOpts().MaxConn > 0 && len(s.clients) >= s.getOpts().MaxConn { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth optimizing in this function to snapshot MaxConn, TLS options, and AuthTimeout? I realize establishing the connection and TLS are already expensive, but am thinking of a thundering herd of clients reconnecting.
@tylertreat Let's rebase to master. Keep signalling removed and make sure we cover Windows builds, etc. Then we are good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concerns are about making a deep copy of Options, or at least be sure of the way options are accessed, and the use of multiple getOpts() within the same function. Regardless of the safety of accessing a copy or the "live" options, it may be logically incorrect to get the user name and password from two calls to getOpts() when the options are changed in between.
main.go
Outdated
@@ -149,13 +148,16 @@ func main() { | |||
usage() | |||
} | |||
|
|||
// Snapshot flag options. | |||
*server.FlagSnapshot = *opts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to copy the values since they may change after? If so, then we need a clone function for options. This assignment simply copy some of the fields, but for instance, the array of users will still point to the original, same for the tls.Config, or if we later add maps, etc...
server/auth.go
Outdated
@@ -30,13 +30,13 @@ func (s *Server) configureAuthorization() { | |||
} | |||
// Check for multiple users first | |||
// This just checks and sets up the user map if we have multiple users. | |||
if s.opts.Users != nil { | |||
if s.getOpts().Users != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we have replaced all references of s.opts
with s.getOpts()
, is it because s.opts
pointer can change during the reload? Again, think of some of the types in the options that are currently not copied. You are protecting against the value of the pointer not being accessed concurrently, but fields pointed by s.opts or s.getOpts would still point to objects that are shared and may incorrectly be concurrently accessed. Of course, if we clone and make sure that values set/reset during reload are on a copy and then swap the pointer, we are ok.
I also have an issue with the same function invoking getOpts(), not only for performance - which still may make a difference - but because if conceptually we assume that the options may change between 2 calls to getOpts(), then we should snapshot once per function. Again, this assumes that the getOpts() would return an immutable structure, otherwise, there is no reason to do that.
} | ||
s.setOpts(newOpts) | ||
s.applyOptions(changed) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be cases that applying an option may fail - in the long run - so I would make applyOptions return an error, and therefore move the s.setOpts() only after we know that we have a value new options set to swap with the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with failing an option application is we could end up in an inconsistent state—some new options are applied, others are not—unless we also introduce a "revert" operation to revert any applied options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because we apply the options to a temporary Options structure, and do not swap the server's options with the new one until after success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two parts to swapping Options
: 1) swapping the Options
pointer itself and 2) mutating the server as necessary (e.g. for trace
we need to update the server logger. 2) could result in an inconsistent state of the server if something fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but if we apply the new options to the server's options that a caller has a pointer to (getOpts()), there is a risk that the caller accesses values that are being changed on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, there is, but I'm not sure how we can apply a new config atomically.
server/reload.go
Outdated
// error. | ||
func (s *Server) diffOptions(newOpts *Options) ([]option, error) { | ||
var ( | ||
oldConfig = reflect.ValueOf(s.opts).Elem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we make the entry API threadsafe (with special mutex) or we don't access s.opts
directly here. I think it would be safer to make the whole Reload()
API protected by its own mutex?
return diffOpts, nil | ||
} | ||
|
||
func (s *Server) applyOptions(opts []option) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As indicated above, I believe that there will be cases where applying an option may produce an error. We could add later of course, since those are internal APIs.
I believe I've addressed everyone's feedback. @kozlovic can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerns addressed except for the question about when we switch the server's options when applying the changes, which I understand is a moot point since there is always the risk that a part of the code is accessing options while options are being updated.
server/log.go
Outdated
s.SetLogger(fileLog, s.opts.Debug, s.opts.Trace) | ||
fileLog := logger.NewFileLogger(opts.LogFile, | ||
opts.Logtime, s.getOpts().Debug, opts.Trace, true) | ||
s.SetLogger(fileLog, s.getOpts().Debug, opts.Trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still using s.getOpts()
here...
@derekcollison I think this is good to go. |
This is a rough first pass at config reload (issue #338). This is just a tracer that implements reload support for the
trace
config option to prove out a working end-to-end example. If we like the overall approach, we can start implementing support for additional config options in a piecemeal fashion. This PR is mostly just to get early feedback.This PR does a few things:
Reload
function toServer
which can be called to reload configuration from a file.trace
option. Attempting to change any other options will result in a reload failure.Open questions:
@nats-io/core