Skip to content

Relax condition on JITServer rememberClass assert #19226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

cjjdespres
Copy link
Contributor

When ignoring the client's SCC, if a server performs an AOT compilation that is not an AOT cache compilation for a client (possible if the client has a valid local SCC) it may cache valid class chain offsets for classes in the client's session data. If the server then performs an AOT cache store for that client, it may receive an invalid class chain from the client for a class that already has a valid cached class chain offset, since the client will be ignoring its local SCC for such a compilation. This situation is valid, and should not trigger the fatal assert in TR_J9JITServerSharedCache::rememberClass() that detects a mismatch between the cached and received class chain offset. That fatal assert is now disabled for AOT cache stores when the server is ignoring the client's SCC.

This fixes an unnecessary crash that could occur in the situation outlined above.

When ignoring the client's SCC, if a server performs an AOT compilation
that is not an AOT cache compilation for a client (possible if the
client has a valid local SCC) it may cache valid class chain offsets for
classes in the client's session data. If the server then performs an
AOT cache store for that client, it may receive an invalid class chain
from the client for a class that already has a valid cached class chain
offset, since the client will be ignoring its local SCC for such a
compilation. This situation is valid, and should not trigger the fatal
assert in TR_J9JITServerSharedCache::rememberClass() that detects a
mismatch between the cached and received class chain offset. That fatal
assert is now disabled for AOT cache stores when the server is ignoring
the client's SCC.

This fixes an unnecessary crash that could occur in the situation
outlined above.

Signed-off-by: Christian Despres <[email protected]>
@cjjdespres cjjdespres requested a review from dsouzai as a code owner March 26, 2024 15:09
@cjjdespres
Copy link
Contributor Author

Attn @mpirvu. I think this may need to go in 0.44 as well, if that is still possible.

@cjjdespres
Copy link
Contributor Author

Notably, this can cause crashes when the client has a local readonly SCC. I didn't catch this in my testing of the default liberty container configuration (with -XX:+JITServerUseAOTCache -XX:+JITServerAOTCacheIgnoreLocalSCC added) because I didn't notice that the embedded SCC was incompatible with the JDK I was using for testing. That would result in a failure to load the cache, so the crash scenario above wouldn't occur.

@mpirvu mpirvu self-assigned this Mar 26, 2024
@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Mar 26, 2024
@pshipton
Copy link
Member

If it's going in please make it happen today. We'll be starting a jdk22 M1 followed by 0.44 M2 builds.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpirvu
Copy link
Contributor

mpirvu commented Mar 26, 2024

jenkins test sanity zlinuxjit jdk17

@mpirvu mpirvu merged commit 87d6563 into eclipse-openj9:master Mar 26, 2024
@cjjdespres
Copy link
Contributor Author

cjjdespres commented Mar 27, 2024

After some further testing, I found a sequence of events across three runs that produced this behaviour:

  1. First run created a local SCC using java -version without talking to a JITServer at all
  2. Second run involved a client running acmeair, with AOT cache enabled, with -Xshareclasses using the cache from (1) marked readonly, and connected to a running JITServer. A certain method was successfully compiled and stored in the JITServer AOT cache.
  3. Third run involved a client with identical options to (2). The method from (2) was served from the cache (I think) but deserialization failed. The client then sent a request for the same method to be compiled, again as AOT, but not as an AOT cache compilation. During compilation, the server requested from the client the chain offset for the class that caused the later assert. Since the compilation wasn't an AOT store, no record was created for the class. The compilation succeeded, and because we enable TR_DisableDelayRelocationForAOTCompilations when using the JITServer AOT cache the method is immediately (and successfully) relocated at the client and not stored in the local SCC. Then, at some point an AOT cache store compilation at the server encountered that same class, needed to create the class serialization record for it, consulted the client in rememberClass(), and found a mismatch between the cached valid offset and the returned invalid offset that I mentioned in the PR description.

In this particular test I was running in containers, and not running with -XX:-JITServerAOTCacheUseTemporaryLayer. That meant that the temporary writable layer was still being initialized on top of the existing readonly cache, and so remote AOT compilations that weren't AOT cache stores or loads could still be requested. It's possible that this scenario can still happen even without the temporary layer, but the layer does seem to have made it much more likely.

(Edit: I should mention that the above steps no longer produce a crash because of this PR, but otherwise this scenario should still occur).

I think it's still correct to disable the assert when we are ignoring the client's local SCC, because in that case we will always get a TR_SharedCache::INVALID_CLASS_CHAIN_OFFSET back from the client, whether or not there is a chain present in the client's SCC. That makes the integrity check represented by the assert inappropriate. Now, it might be undesirable to have plain AOT compilations being scheduled at all when using the AOT cache, but that's a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants