Skip to content

Ensure that embedded mask handling is covering the right scenarios #116201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2025

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jun 2, 2025

This ensures that embedded broadcasting is being handled for all the currently supported scenarios and is blocked for the scenarios that shouldn't.

Here are the diffs. There is a very minor (0.03%) throughput hit due to the extra checks required. While there are some decent diff wins since we can avoid instantiation the zero vector and can embed a few places we were otherwise missing.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding marked this pull request as ready for review June 3, 2025 17:27
@tannergooding tannergooding requested review from Copilot and EgorBo June 3, 2025 17:27
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refines embedded mask handling in the JIT lowering and code generation for x86/x64 intrinsics, ensuring that the masking and broadcast behaviors cover the intended scenarios.

  • Introduces a check for zero-vector operands in masking logic to optimize constant handling.
  • Updates intrinsic flag assignments and centralizes table-driven HW intrinsic logic calls.
  • Adjusts EVEX prefix handling to correctly factor in AAA and Z context flags.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/coreclr/jit/lowerxarch.cpp Adds check for zero-vector operands and refines embedded mask handling logic.
src/coreclr/jit/instrsxarch.h Removes KMask_Base flags on certain intrinsics to update masking behavior.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Updates use of table-driven intrinsic helper and introduces case-specific handling.
src/coreclr/jit/hwintrinsic.h Introduces a static helper for table-driven HW intrinsic decisions.
src/coreclr/jit/emitxarch.h Adjusts EVEX flag checks for embedded mask and broadcast settings.
src/coreclr/jit/emitxarch.cpp Consolidates EVEX prefix adjustments by incorporating the Z context flag.
src/coreclr/jit/codegeninterface.h Adds a method to check embedded broadcast support for target-specific behavior.
Comments suppressed due to low confidence (5)

src/coreclr/jit/instrsxarch.h:236

  • Removing the KMask_Base flag for movaps appears intentional; please ensure that the updated masking behavior is validated with targeted tests.
INST3(movaps,           "movaps",           IUM_WR, PCKFLT(0x29), BAD_CODE,     PCKFLT(0x28),                            INS_TT_FULL_MEM,                                                        REX_W0_EVEX  | Encoding_VEX  | Encoding_EVEX)

src/coreclr/jit/hwintrinsiccodegenxarch.cpp:1326

  • The new conditional block for op1 containment in NI_AVX512_BlendVariableMask handling should be accompanied by tests covering this edge case.
regNumber op1Reg = REG_NA;

src/coreclr/jit/emitxarch.cpp:2217

  • The revised EVEX prefix handling, where the Z context flag is now applied within the mask registration block, should be validated with tests to ensure encoding correctness.
if (id->idIsEvexZContextSet())

src/coreclr/jit/codegeninterface.h:188

  • The addition of IsEmbeddedBroadcastEnabled may affect behavior across different targets; please ensure comprehensive tests are in place for embedded broadcast scenarios.
bool IsEmbeddedBroadcastEnabled(instruction ins, GenTree* op);

src/coreclr/jit/emitxarch.h:1290

  • Combining the EVEX AAA and Z context flags in this return statement should be carefully verified to ensure it matches the intended encoding behavior.
return id->idIsEvexAaaContextSet() || id->idIsEvexZContextSet();

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, @EgorBo

Comment on lines +10385 to +10387
// TODO-AVX512-CQ: Codegen is currently limited to only handling embedded
// masking for table driven intrinsics. This can be relaxed once that is fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a hard change to make and I actually have the work done already. I just wanted to separate the correctness fix from the codegen improvements

@tannergooding tannergooding merged commit 56c80cd into dotnet:main Jun 5, 2025
107 of 109 checks passed
@tannergooding tannergooding deleted the kmask-improvements branch June 5, 2025 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants