-
Notifications
You must be signed in to change notification settings - Fork 5k
Ensure that embedded mask handling is covering the right scenarios #116201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
11c6aa2
to
ff33a25
Compare
ff33a25
to
be8e1f9
Compare
27a2408
to
3972dab
Compare
e67be1c
to
07f788a
Compare
07f788a
to
8a9e359
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refines embedded mask handling in the JIT lowering and code generation for x86/x64 intrinsics, ensuring that the masking and broadcast behaviors cover the intended scenarios.
- Introduces a check for zero-vector operands in masking logic to optimize constant handling.
- Updates intrinsic flag assignments and centralizes table-driven HW intrinsic logic calls.
- Adjusts EVEX prefix handling to correctly factor in AAA and Z context flags.
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/coreclr/jit/lowerxarch.cpp | Adds check for zero-vector operands and refines embedded mask handling logic. |
src/coreclr/jit/instrsxarch.h | Removes KMask_Base flags on certain intrinsics to update masking behavior. |
src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Updates use of table-driven intrinsic helper and introduces case-specific handling. |
src/coreclr/jit/hwintrinsic.h | Introduces a static helper for table-driven HW intrinsic decisions. |
src/coreclr/jit/emitxarch.h | Adjusts EVEX flag checks for embedded mask and broadcast settings. |
src/coreclr/jit/emitxarch.cpp | Consolidates EVEX prefix adjustments by incorporating the Z context flag. |
src/coreclr/jit/codegeninterface.h | Adds a method to check embedded broadcast support for target-specific behavior. |
Comments suppressed due to low confidence (5)
src/coreclr/jit/instrsxarch.h:236
- Removing the KMask_Base flag for movaps appears intentional; please ensure that the updated masking behavior is validated with targeted tests.
INST3(movaps, "movaps", IUM_WR, PCKFLT(0x29), BAD_CODE, PCKFLT(0x28), INS_TT_FULL_MEM, REX_W0_EVEX | Encoding_VEX | Encoding_EVEX)
src/coreclr/jit/hwintrinsiccodegenxarch.cpp:1326
- The new conditional block for op1 containment in NI_AVX512_BlendVariableMask handling should be accompanied by tests covering this edge case.
regNumber op1Reg = REG_NA;
src/coreclr/jit/emitxarch.cpp:2217
- The revised EVEX prefix handling, where the Z context flag is now applied within the mask registration block, should be validated with tests to ensure encoding correctness.
if (id->idIsEvexZContextSet())
src/coreclr/jit/codegeninterface.h:188
- The addition of IsEmbeddedBroadcastEnabled may affect behavior across different targets; please ensure comprehensive tests are in place for embedded broadcast scenarios.
bool IsEmbeddedBroadcastEnabled(instruction ins, GenTree* op);
src/coreclr/jit/emitxarch.h:1290
- Combining the EVEX AAA and Z context flags in this return statement should be carefully verified to ensure it matches the intended encoding behavior.
return id->idIsEvexAaaContextSet() || id->idIsEvexZContextSet();
CC. @dotnet/jit-contrib, @EgorBo |
// TODO-AVX512-CQ: Codegen is currently limited to only handling embedded | ||
// masking for table driven intrinsics. This can be relaxed once that is fixed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a hard change to make and I actually have the work done already. I just wanted to separate the correctness fix from the codegen improvements
This ensures that embedded broadcasting is being handled for all the currently supported scenarios and is blocked for the scenarios that shouldn't.
Here are the diffs. There is a very minor (0.03%) throughput hit due to the extra checks required. While there are some decent diff wins since we can avoid instantiation the zero vector and can embed a few places we were otherwise missing.