-
Notifications
You must be signed in to change notification settings - Fork 60
Azure-AD-plugin never updates its proxy IP address following DNS TTL and record changes. #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I guess we could expire the graph client every now and then in the cache (ensuring its shutdown cleanly as well) here: azure-ad-plugin/src/main/java/com/microsoft/jenkins/azuread/GraphClientCache.java Lines 29 to 31 in cc82c54
Not too keen on it though. How often do you change your proxy DNS? From my experience proxies are generally fairly static which I guess is why this hasn't came up before now. |
Sadly, we do quite frequently (once every 2~3 weeks or so) which means our Jenkins server with 120+ nodes that does jobs 24/7 for tens of users in 3 time zones, has suddenly starter requiring restarts at least once per month after we migrated from LDAP to Azure AD authentication (we have no way back). The way these DNS updates are done is inherent to AWS and depending on setup a lot of people will have the same situation. For example our squid proxy is an AWS ASG behind Application Load Balancers. IPs are assigned to load balancer interfaces by AWS and these are then resolved by the DNS name. Any time a network interface for an ELB fails. AWS replaces it giving it a new IP (these are all internal IPs so no elastic IP here) and updates the DNS. Therefore causing this issue... Regarding if this should be in the library. That was my first thought, that's why I logged it with them, but I'm not so sure anymore. It seems an elaborate mechanism based on TTL should be present in the library, but a quick fix is much easier to implement in the client. So If it could be done that the client is recreated on a defined (or even hardcoded) schedule. That would be great. |
Reading the code it looks like the SDK is where this best fits they have a lot of code around proxying, this plugin just delegates to the library. If someone wants to contribute a work around here, that's fine although I probably wouldn't enable it by default. |
Jenkins and plugins versions report
Environment
What Operating System are you using (both controller, and any agents involved in the problem)?
Linux RedHat 7.8
Reproduction steps
Configure a non-authenticated http proxy server in Jenkins by providing its DNS name. Configure the Azure-AD plugin for Azure Auth. Leave Jenkins running. Then change the proxy IP address and update the DNS A record (or /etc/hosts entry for simplicity of testing).
Result: Azure AD no longer works, because it tries to connect to the old proxy IP.
After Jenkins is restarted Azure AD fetches the correct IP after performing a new DNS resolution.
Expected Results
Azure AD observes the DNS record TTL and refreshes the IP resolution.
Alternatively it refreshes the DNS resolution at a configurable interval.
Actual Results
Errorst such as:
Error sending HTTP request: connection timed out: proxy-name/X.X.X.X:3128 where X.X.X.X is an old IP.
Anything else?
When the proxy DNS A record resolved to two IP addresses (for redundancy) and only one of these IPs changes (the DNS record is updated of course). On some Jenkins servers Azure AD fails to connect trying to talk to the old IP. On other servers it appears to work fine at least for a period of time if it happens to be resolving the DNS record to the IP that didn't change.
I've tracked the problem to how the plugin uses azure-sdk-for-java. I've also opened an issue there in case the maintainers want to resolve it in the library: Azure/azure-sdk-for-java#38963 In that issue I demonstrate the issue with a simple example app. I think it would be good if the library resolved it, but I decided to open the issue here as it is mostly affecting this plugin and causes big disruption every time Jenkins masters need to be restarted.
Are you interested in contributing a fix?
I can help in testing.
The text was updated successfully, but these errors were encountered: