-
Notifications
You must be signed in to change notification settings - Fork 5k
arm64: Change EQ/NE node to SETCC if the operand supports the zero flag #112235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jonathandavies-arm
commented
Feb 6, 2025
- Add one line tests for adds, ands, bics & subs
- Add tests for negs
- Contributes to Review the multi-op instruction usage for Arm64 #68028 - tst & negs completed
* Add one line tests for adds, ands, bics & subs * Add tests for negs
@a74nh @kunalspathak @dotnet/arm64-contrib |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
src/coreclr/jit/lower.cpp
Outdated
#ifdef TARGET_ARM64 | ||
// Check to see if we have a relopOp1 that supports setting flags e.g. AND & relopOp2 is 0. | ||
// We can then set flags on relopOp1, remove relopOp2 & cmp and insert a SETCC node after Op1. | ||
GenTreeOp* relop = cmp->AsOp(); | ||
GenTree* relopOp1 = relop->gtGetOp1(); | ||
GenTree* relopOp2 = relop->gtGetOp2(); | ||
GenTree* parent = relop->gtGetParent(nullptr); | ||
if (parent && parent->OperIsSimple() && comp->opts.OptimizationEnabled() && relop->OperIs(GT_EQ, GT_NE) && | ||
relopOp2->IsIntegralConst(0) && relopOp1->SupportsSettingZeroFlag() && IsInvariantInRange(relopOp1, parent)) | ||
{ | ||
// Update relopOp1 flags and remove cmp & relopOp2 | ||
relopOp1->gtFlags |= GTF_SET_FLAGS; | ||
relopOp1->SetUnusedValue(); | ||
BlockRange().Remove(relopOp1); | ||
BlockRange().InsertBefore(relop, relopOp1); | ||
BlockRange().Remove(relop); | ||
BlockRange().Remove(relopOp2); | ||
|
||
// Insert a SETCC node after Op1 | ||
GenCondition cmpCondition = GenCondition::FromRelop(relop); | ||
GenTreeCC* setcc = comp->gtNewCC(GT_SETCC, TYP_INT, cmpCondition); | ||
BlockRange().InsertAfter(relopOp1, setcc); | ||
|
||
// Update the parent op with the new SETCC node | ||
GenTreeOp* parentOp = parent->AsOp(); | ||
if (parentOp->gtOp1 == relop) | ||
{ | ||
parentOp->gtOp1 = setcc; | ||
} | ||
else | ||
{ | ||
assert(parentOp->gtOp2 == relop); | ||
parentOp->gtOp2 = setcc; | ||
} | ||
return setcc; | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just repeating the optimization that happens in TryLowerConditionToFlagsNode
. Can you factor the logic out, or change this logic to use that function instead?
Also, I suspect this change does not need to be arm64 specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between TryLowerConditionToFlagsNode
and this code is inserting the relopOp1
.
In TryLowerConditionToFlagsNode
it has BlockRange().InsertBefore(parent, relopOp1);
This block of code has BlockRange().InsertBefore(relop, relopOp1);
Changing TryLowerConditionToFlagsNode
to the before relop breaks when LowerJTrue()
calls it.
A couple of options are:
- Add bool arg to
TryLowerConditionToFlagsNode
to select which insert we do. - Investigate why
LowerJTrue()
doesn't work with the change.
Any other ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into trying to use TryLowerConditionToFlagsNode
in LowerCompare
and to use it we will need to make 2 changes.
- Instead of inserting
relopOp1
back in before the parent we need to insert it beforerelop
This is so that if you have a parent node such as OR with operands EQ & EQ each with a ADD operand we can lower both EQ nodes to CSET and outputands
in both cases.
Before
N003 ( 3, 3) [000002] -----+----- t2 = * ADD int $100
N004 ( 1, 2) [000003] -----+----- t3 = CNS_INT int 0 $40
/--* t2 int
+--* t3 int
N005 ( 8, 6) [000004] -----+----- t4 = * EQ int $101
N006 ( 1, 1) [000005] -----+----- t5 = LCL_VAR int V02 arg2 u:1 (last use) $82
N007 ( 1, 1) [000006] -----+----- t6 = LCL_VAR int V03 arg3 u:1 (last use) $83
/--* t5 int
+--* t6 int
N008 ( 3, 3) [000007] -----+----- t7 = * ADD int $102
N009 ( 1, 2) [000008] -----+----- t8 = CNS_INT int 0 $40
/--* t7 int
+--* t8 int
N010 ( 8, 6) [000009] -----+----- t9 = * EQ int $103
/--* t4 int
+--* t9 int
N011 ( 17, 13) [000010] -----+----- t10 = * OR int $104
After
N003 ( 3, 3) [000002] -----+----- u2 = * ADD int $100
[000014] ----------- t14 = SETCC int cond=UEQ
N006 ( 1, 1) [000005] -----+----- t5 = LCL_VAR int V02 arg2 u:1 (last use) $82
N007 ( 1, 1) [000006] -----+----- t6 = LCL_VAR int V03 arg3 u:1 (last use) $83
/--* t5 int
+--* t6 int
N008 ( 3, 3) [000007] -----+----- u7 = * ADD int $102
[000015] ----------- t15 = SETCC int cond=UEQ
/--* t14 int
+--* t15 int
N011 ( 17, 13) [000010] -----+----- t10 = * OR int $104
- If we call
TryLowerConditionToFlagsNode
inLowerCompare
and we fail these if conditions.
if (optimizing && relop->OperIs(GT_EQ, GT_NE) && relopOp2->IsIntegralConst(0) &&
relopOp1->SupportsSettingZeroFlag() && IsInvariantInRange(relopOp1, parent))
{
we don't want to go into the else case. It messes up the relop
node and we get asserts.
I've made the changes that would be needed to TryLowerConditionToFlagsNode
if we still want to call it here.
jonathandavies-arm@f34b9ba
I think I that we should either not call TryLowerConditionToFlagsNode
in LowerCompare
or extract out the if statement and 6 common lines into another function that can be called from TryLowerConditionToFlagsNode
and LowerCompare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check what the arm64 and x64 diffs look like if you completely move the optimization from TryLowerConditionToFlagsNode
to LowerCompare
? That is, what you had before except enabled for all platform, and then remove the corresponding optimization in TryLowerConditionToFlagsNode
.
Given that the optimization here produces SETCC
which is removable by TryLowerConditionToFlagsNode
then it might not come with any significant regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done these 2 changes in these commits jonathandavies-arm@3d4c241 & jonathandavies-arm@7858064.
I've tried using superpmi these commits as the base & diff but I get the following asserts in lir.cpp.
[09:03:08] ================ Logging to /home/jondav01/projects/dotnet/runtime/artifacts/spmi/superpmi.7.log
[09:03:08] Using JIT/EE Version from jiteeversionguid.h: cc0e7adf-e397-40b6-9d14-a7149815c991
[09:03:08] Found download cache directory "/home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64" and --force_download not set; skipping download
[09:03:08] SuperPMI replay
[09:03:08] JIT Path: /home/jondav01/projects/dotnet/runtime/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/libclrjit.so
[09:03:08] Using MCH files:
[09:03:08] /home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64/benchmarks.run.linux.arm64.checked.mch
[09:03:08] Running SuperPMI replay of /home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64/benchmarks.run.linux.arm64.checked.mch
[09:03:09] Compilation failures
[09:03:10] ============================== Assertions:
[09:03:10] node->IsUnusedValue() && "found an unmarked unused value" (/home/jondav01/projects/dotnet/runtime/src/coreclr/jit/lir.cpp:1706)
[09:03:10] /home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64/benchmarks.run.linux.arm64.checked.mch # 82 : IL size 7
[09:03:10] /home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64/benchmarks.run.linux.arm64.checked.mch # 29020 : IL size 7
[09:03:10] /home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64/benchmarks.run.linux.arm64.checked.mch # 4461 : IL size 8
[09:03:10] /home/jondav01/projects/dotnet/runtime/artifacts/spmi/mch/cc0e7adf-e397-40b6-9d14-a7149815c991.linux.arm64/benchmarks.run.linux.arm64.checked.mch # 29845 : IL size 9
...
Please can you check if the changes I made are what you were thinking of?
I think atm we could remove the ARM64 ifdef but removing that if case from TryLowerConditionToFlagsNode seems to be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove the ARM64 ifdef and fixed a test.
I don't think we should remove that section from TryLowerConditionToFlagsNode
. On x64 when it lowers a SELECT node it adds in a test
to get the flags instead of setting the flags on a OR node (in the example I was looking at).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On x64 when it lowers a SELECT node it adds in a test to get the flags instead of setting the flags on a OR node (in the example I was looking at).
Can you share the jitdump of this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[MethodImpl(MethodImplOptions.NoInlining)]
static void AndAndInner(int op1, int op2, int op3) {
if (op1 % 2 == 0 && op2 % 2 == 0 && op3 % 2 == 0) {
op1 = 5;
}
Consume(op1, op2, op3, 0);
}
G_M49537_IG01: ;; offset=0x0000
push rax
;; size=1 bbWeight=1 PerfScore 1.00
G_M49537_IG02: ;; offset=0x0001
mov ecx, edi
and ecx, 1
mov eax, esi
and eax, 1
or ecx, eax
mov eax, edx
and eax, 1
or ecx, eax
mov eax, 5
test ecx, ecx
cmove edi, eax
xor ecx, ecx
call [TestAdd.Program:Consume(int,int,int,int)]
nop
;; size=38 bbWeight=1 PerfScore 6.25
G_M49537_IG03: ;; offset=0x0027
add rsp, 8
ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why the opt would not be kicking in for this case. I tried the following change locally and it definitely kicks in for me: https://github.com/dotnet/runtime/compare/main...jakobbotsch:move-opt?expand=1
I see negligible diffs for x64 with that change, so I would suggest switching to it. It also cleans up some stuff around the transformation (e.g. gtGetParent
should not be used in LIR, instead we have BlockRange::TryGetUse
) and moves it into OptimizeConstCompare
where other similar optimizations are happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit removing my changes to lower.cpp and added yours.
src/coreclr/jit/lower.cpp
Outdated
parentOp->gtOp2 = setcc; | ||
} | ||
return setcc; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Looks like there are some conflicts in the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…ag (dotnet#112235) * arm64: Change EQ/NE node to SETCC if the operand supports the zero flag * Add one line tests for adds, ands, bics & subs * Add tests for negs