Ensure that embedded mask handling is covering the right scenarios #116201

tannergooding · 2025-06-02T02:50:47Z

This ensures that embedded broadcasting is being handled for all the currently supported scenarios and is blocked for the scenarios that shouldn't.

Here are the diffs. There is a very minor (0.03%) throughput hit due to the extra checks required. While there are some decent diff wins since we can avoid instantiation the zero vector and can embed a few places we were otherwise missing.

dotnet-policy-service · 2025-06-02T02:51:36Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This pull request refines embedded mask handling in the JIT lowering and code generation for x86/x64 intrinsics, ensuring that the masking and broadcast behaviors cover the intended scenarios.

Introduces a check for zero-vector operands in masking logic to optimize constant handling.
Updates intrinsic flag assignments and centralizes table-driven HW intrinsic logic calls.
Adjusts EVEX prefix handling to correctly factor in AAA and Z context flags.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/coreclr/jit/lowerxarch.cpp	Adds check for zero-vector operands and refines embedded mask handling logic.
src/coreclr/jit/instrsxarch.h	Removes KMask_Base flags on certain intrinsics to update masking behavior.
src/coreclr/jit/hwintrinsiccodegenxarch.cpp	Updates use of table-driven intrinsic helper and introduces case-specific handling.
src/coreclr/jit/hwintrinsic.h	Introduces a static helper for table-driven HW intrinsic decisions.
src/coreclr/jit/emitxarch.h	Adjusts EVEX flag checks for embedded mask and broadcast settings.
src/coreclr/jit/emitxarch.cpp	Consolidates EVEX prefix adjustments by incorporating the Z context flag.
src/coreclr/jit/codegeninterface.h	Adds a method to check embedded broadcast support for target-specific behavior.

Comments suppressed due to low confidence (5)

src/coreclr/jit/instrsxarch.h:236

Removing the KMask_Base flag for movaps appears intentional; please ensure that the updated masking behavior is validated with targeted tests.

INST3(movaps,           "movaps",           IUM_WR, PCKFLT(0x29), BAD_CODE,     PCKFLT(0x28),                            INS_TT_FULL_MEM,                                                        REX_W0_EVEX  | Encoding_VEX  | Encoding_EVEX)

src/coreclr/jit/hwintrinsiccodegenxarch.cpp:1326

The new conditional block for op1 containment in NI_AVX512_BlendVariableMask handling should be accompanied by tests covering this edge case.

regNumber op1Reg = REG_NA;

src/coreclr/jit/emitxarch.cpp:2217

The revised EVEX prefix handling, where the Z context flag is now applied within the mask registration block, should be validated with tests to ensure encoding correctness.

if (id->idIsEvexZContextSet())

src/coreclr/jit/codegeninterface.h:188

The addition of IsEmbeddedBroadcastEnabled may affect behavior across different targets; please ensure comprehensive tests are in place for embedded broadcast scenarios.

bool IsEmbeddedBroadcastEnabled(instruction ins, GenTree* op);

src/coreclr/jit/emitxarch.h:1290

Combining the EVEX AAA and Z context flags in this return statement should be carefully verified to ensure it matches the intended encoding behavior.

return id->idIsEvexAaaContextSet() || id->idIsEvexZContextSet();

src/coreclr/jit/lowerxarch.cpp

src/coreclr/jit/hwintrinsic.h

tannergooding · 2025-06-03T17:31:57Z

CC. @dotnet/jit-contrib, @EgorBo

tannergooding · 2025-06-03T17:37:05Z

src/coreclr/jit/lowerxarch.cpp

+                                        // TODO-AVX512-CQ: Codegen is currently limited to only handling embedded
+                                        // masking for table driven intrinsics. This can be relaxed once that is fixed.
+


This isn't a hard change to make and I actually have the work done already. I just wanted to separate the correctness fix from the codegen improvements

src/coreclr/jit/hwintrinsic.h

dotnet-policy-service bot assigned tannergooding Jun 2, 2025

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2025

tannergooding force-pushed the kmask-improvements branch from 11c6aa2 to ff33a25 Compare June 2, 2025 03:20

build-analysis bot mentioned this pull request Jun 2, 2025

Timeout in HostFactoryResolverTests.NoSpecialEntryPointPatternCanRunInParallel #114704

Open

tannergooding force-pushed the kmask-improvements branch from ff33a25 to be8e1f9 Compare June 2, 2025 16:30

This was referenced Jun 2, 2025

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

[linux-x64] [mono-aot] Test Runtime_101731.TestConvertToInt64NativeSingle(3.4028235E+38) returns exit code 22 #112557

Open

tannergooding force-pushed the kmask-improvements branch 2 times, most recently from 27a2408 to 3972dab Compare June 2, 2025 22:00

This was referenced Jun 3, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

/root/helix/work/correlation/scripts/<hash>/execute.sh: Permission denied dotnet/dnceng#3412

Open

tannergooding force-pushed the kmask-improvements branch 2 times, most recently from e67be1c to 07f788a Compare June 3, 2025 06:16

Ensure that embedded mask handling is covering the right scenarios

8a9e359

tannergooding force-pushed the kmask-improvements branch from 07f788a to 8a9e359 Compare June 3, 2025 15:15

tannergooding marked this pull request as ready for review June 3, 2025 17:27

tannergooding requested review from Copilot and EgorBo June 3, 2025 17:27

Copilot AI reviewed Jun 3, 2025

View reviewed changes

src/coreclr/jit/lowerxarch.cpp Show resolved Hide resolved

src/coreclr/jit/hwintrinsic.h Show resolved Hide resolved

tannergooding commented Jun 3, 2025

View reviewed changes

EgorBo reviewed Jun 4, 2025

View reviewed changes

src/coreclr/jit/hwintrinsic.h Outdated Show resolved Hide resolved

EgorBo approved these changes Jun 4, 2025

View reviewed changes

Update src/coreclr/jit/hwintrinsic.h

04fd724

tannergooding merged commit 56c80cd into dotnet:main Jun 5, 2025
107 of 109 checks passed

tannergooding deleted the kmask-improvements branch June 5, 2025 03:43

build-analysis bot mentioned this pull request Jun 5, 2025

STATUS_UNSUCCESSFUL in RsaCryptRoundtrip_OaepSHA1 #29683

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ensure that embedded mask handling is covering the right scenarios #116201

Ensure that embedded mask handling is covering the right scenarios #116201

Uh oh!

tannergooding commented Jun 2, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Jun 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 3, 2025

Uh oh!

tannergooding Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		// TODO-AVX512-CQ: Codegen is currently limited to only handling embedded
		// masking for table driven intrinsics. This can be relaxed once that is fixed.

Ensure that embedded mask handling is covering the right scenarios #116201

Ensure that embedded mask handling is covering the right scenarios #116201

Uh oh!

Conversation

tannergooding commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jun 2, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 3, 2025

Uh oh!

tannergooding Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 2, 2025 •

edited

Loading