-
Notifications
You must be signed in to change notification settings - Fork 9k
HADOOP-18329 - Support for IBM Semeru JVM v>11.0.15.0 Vendor Name Changes #4537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18329 - Support for IBM Semeru JVM v>11.0.15.0 Vendor Name Changes #4537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think just staying with IBM_JAVA simplifies this change. we aren't going to remove the deprecated field in case it is used externally.
hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/util/PlatformName.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-minikdc/src/test/java/org/apache/hadoop/minikdc/TestMiniKdc.java
Outdated
Show resolved
Hide resolved
Thank you for taking the time to review this change so far, I have implemented your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me; i trust you to have done the testing.
i made a comment about the spelling in a comment, please could you add that just to keep the US-spelling developers happy. thx.
once that is in, I will merge here and to branch-3.3, which will be releasing an update this year. testing there would be wonderful
hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/util/PlatformName.java
Outdated
Show resolved
Hide resolved
aah, one of the runs now finds a loop in references. now I understand the duplicate code. Can you fix that by restoring the code, removing the pom changes, *and add a comment to the source saying "duplicated to avoid cycles in the build" |
@steveloughran, I will conduct some more testing from a spark perspective and report back once I'm fully confident, since that is where I initially observed the failures. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
relates to eclipse-openj9/openj9#14950 |
one checkstyle |
try { | ||
Thread.currentThread().getContextClassLoader().loadClass(className); | ||
return true; | ||
} catch(ClassNotFoundException ignored) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can any other exception get raised here? if so, best to log and downgrade to false too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @steveloughran - Would you suggest we are catching the generic Exception instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
…22588) Fixes #20754 Hadoop in the current version wrongly checks which login module should be used on some IBM Java versions. We should skip problematic configurations until Hadoop has the fix released. See also apache/hadoop#4537
…22588) Fixes #20754 Hadoop in the current version wrongly checks which login module should be used on some IBM Java versions. We should skip problematic configurations until Hadoop has the fix released. See also apache/hadoop#4537
…22588) Fixes #20754 Hadoop in the current version wrongly checks which login module should be used on some IBM Java versions. We should skip problematic configurations until Hadoop has the fix released. See also apache/hadoop#4537
Has there been any movement on this pr? |
I'll try and carve out a few hours this week 👍 some slight enhancement I think could be made in the class loader method I originally came up with (I want to test my concerns). |
This should be a bit more robust to extension as well as handle the concerns I had about the class loader from before. Is there any consensus/ruling around adding a test against the IBM JREs? I appreciate it would take a bit of time on CI and this is a once in a blue moon activity, but it could be a single suite of integration tests against auth that execute to verify the result against latest semeru is not IBM, and vice versa. |
@JackBuggins merged to trunk, but it has merge conflicts into 3.3 with SSLFactory.java. can you do a quick review and PR there? I'm busy testing the abfs blockers hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/ssl/SSLFactory.java
|
@steveloughran I've popped up a PR against branch 3.3; should I do the same for 3.3.5? |
get it into 3.3 and i will pull to 3.3.5, they are almost identical. if there are merge problems again, then we can worry about it |
…ges (apache#4537) The static boolean PlatformName.IBM_JAVA now identifies Java 11+ IBM Semeru runtimes as IBM JVM releases. Contributed by Jack Buggins.
…azelcast#22588) Fixes hazelcast#20754 Hadoop in the current version wrongly checks which login module should be used on some IBM Java versions. We should skip problematic configurations until Hadoop has the fix released. See also apache/hadoop#4537
Hi, thank you for solving this problem. Is there any plan when 3.3.5 will be released? Thanks fo the reply Best regards Kamil |
@ivrisivris Sorry, I don't know the answer to that one, but there is a workaround you might consider in the meantime which makes use of an agent to manipulate the vendor name - you can see an example here |
@JackBuggins Thank you for responding so quickly. |
The 3.3.5 RC0 is up for testing. Grab it from the Hadoop web site and make sure it works for you. Do not wait until the final release-as if that is broken or will still take a while for a patched release to ship. |
Hi, Errors Enviroment Thank you for your help Kamil |
I'll check this out against 11.0.17 today |
jack, be good to know. we are going to do a new RC next week and this is the kind of stabilisation issue we can address |
I've just span up a version of rc-0 3.3.5 against semeru 11.0.17 from and executed a few of the sample jobs and checked out the dev/null out. eg.
I'm not seeing anything suspicious in the stderr - @ivrisivris can you give me some more detail on the specific cluster config / actions / workloads that triggered this so I can reproduce and diagnose a bit better? Are you using this with Spark or anything else? Are these being added to the classpath in some other way? I currently don't believe anything packaged IBM semeru 11.0.17 or hadoop rc-0 3.3.5 contains something to trigger the check added, (I've even ran a jar with just that class to check that, and this looks good). Whilst I wait on a response on the above I'll check that there isn't anywhere else that is not using the |
From what I can see, the below could still exhibit the same behaviour @steveloughran - I'll see if I can figure out how to get into these paths to validate it so I understand the scenario @ivrisivris is likely hitting.
|
When actually using the RC0 against spark 3.4.0 I can't reproduce this either. From what I can see the only way you can get into the path of the stack trace above is by including one of the com.ibm.security classes at runtime, (two cases above which might need some work should land us elsewhere) and apart from testing against some more base OS, or understanding how this is being used I'm stuck here unfortunately. This may present a good case to allow a config opt to the environment/config files that overrides the default logic being used here as well as debug logging. @ivrisivris if you can give any more details like specific base OS, perhaps provide some basic app with the same dependencies that I can inspect I'm happy to investigate further and try to make sure it can be resolved for you. Thank you! |
maybe start with logging that java.vendor sysprop at debug. reopened the JIRA |
hi,JackBuggins,I am using in the merged hadoop3.3.4 your #5208 code, start the datanode error, an error is as follows: |
@Tre2878 did you build the branch it was previously merged to from source? It's merged back retrospectively but once published initially there aren't more updates to a stable stream in terms of distributions. Ie. If you aren't switching versions and aren't building you aren't getting any back ported fixes. Please us 3.3.5 RC or build the branch yourself to pick up the changes 👍🏼 |
If you are finding some more areas where IBM classes are being called relating to auth, if you can demonstrate a configuration with specific details as well as how you reproduce it I will take a look. I need the JRE details, OS details and full hadoop config and command executed to hit this. If this is specifically relating to Kerberos, please add details for that too but obviously I won't need any secrets. I've spent some good hours trying to replicate it so far so having this detail will be awesome for me to understand where it's failing for some. May be best to pop this on the jira only. Thanks! |
@Tre2878 - any of the files I listed here look sus to you #4537 (comment)? |
ok, i think we can hopefully say this is a cannot reproduce state, especially if @Tre2878 is using their own build.
|
Yeah, I just need a way I can reliably hit the errors to understand a bit better what isn't covered so far to proceed further. This indeed looks similar to the previous issue reported, so it would seem to be a common use case. Thanks for sharing the process. |
I am in the branch: https://github.com/apache/hadoop/tree/branch-3.3.4 to merge your #5208 code, compile, deploy start an error, My understanding is that there is no patch for 3.3.4 yet, right? |
@Tre2878 - That's correct, it's just branch-3.3 and trunk, which is in turn in the 3.3.5 stream, so right now upgrading/testing out the 3.3.5 stream is the way to get these changes. Looks like the 3.3.4 distro was cut around August 2022 (based on git releases), and the changes go to branch-3.3 during December 2022. I would need to defer to project owners if porting to branch 3.3.4 for others to pickup when building from source would be an option here. You could try cherry-picking the commit against 3.3 to branch 3.3.4 then build your own copy to test, or otherwise consider applying some type of agent workaround to modify the vendor name until you're in a position to do this. |
3.3.5 is the successor to 3.3.4 from the ASF, therefore there is a patch for 3.3.4 and it is called "upgrade to 3.3.5". you can take our 3.3.4 release, fork it and make whatever changes you want, but you will have a private release at that point. |
This is to include a fix for IBM Semeru builds of OpenJ9 based JVMs which as included in version 3.3.5. Details at apache/hadoop#4537
Description of PR
There are checks within the PlatformName class that use the Vendor property of the provided runtime JVM specifically looking for
IBM
within the name. Whilst this check worked for IBM's java technology edition it fails to work on Semeru since 11.0.15.0 due to the following change:java.vendor system property
In this release, the java.vendor system property has been changed from "International Business Machines Corporation" to "IBM Corporation".
Modules such as the below are not provided in these runtimes.
com.ibm.security.auth.module.JAASLoginModule
This change attempts to use reflection to ensure that a class common to IBM JT runtimes exists, extending upon the vendor check, since IBM vendored JVM's may not actually require special logic to use custom security modules. The same 3.3.3 versions were working correctly until the vendor name change was observed during routine upgrades by internal CI.
How was this patch tested?
CI + Unit test, some seemingly unrelated failures were observed relating to
java.lang.NoSuchMethodError: java.nio.ByteBuffer.limit(I)Ljava/nio/ByteBuffer;
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?