-
Notifications
You must be signed in to change notification settings - Fork 5k
Adding push2/pop2 #116035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding push2/pop2 #116035
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
e9713e7
to
7dcab65
Compare
7dcab65
to
b960c6e
Compare
b960c6e
to
f25f57b
Compare
04bdc1b
to
9e1a9f5
Compare
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283571 / 284631 (99.63%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 305223 / 306944 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283315 / 284395 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 299368 / 301100 (99.42%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
66245b1
to
68903ee
Compare
RISC-V Release-CLR-QEMU: 9084 / 9114 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
68903ee
to
908cedb
Compare
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 260891 / 261953 (99.59%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 299817 / 301551 (99.42%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
908cedb
to
bc308aa
Compare
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283658 / 284736 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
bc308aa
to
6b2c687
Compare
6b2c687 is being scheduled for building and testingGIT: |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for the new PUSH2/POP2 instructions and PPX features, updating the unwind, instruction, and code generation infrastructure accordingly.
- Adds new unwind functions (unwindPush2, unwindPush2Pop2CFI, etc.) and APX PPX handling for various architectures.
- Updates configuration, instruction definitions, emitter logic, and unit tests to integrate the new instructions.
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
src/coreclr/jit/unwindx86.cpp | Added an empty unwindPush2() function; may need implementation to simulate push2. |
src/coreclr/jit/unwindriscv64.cpp | Added unwindPush2() with an unreached() call as a placeholder for RISCV64. |
src/coreclr/jit/unwindloongarch64.cpp | Added unwindPush2() with unreached() for LoongArch64. |
src/coreclr/jit/unwindarm64.cpp | Added unwindPush2() with unreached() for ARM64. |
src/coreclr/jit/unwindamd64.cpp | Implements unwindPush2() for AMD64, including a Windows-specific path using two pushes. |
src/coreclr/jit/unwind.cpp | Added unwindPush2Pop2CFI() as a placeholder for CFI unwind handling for push2/pop2. |
src/coreclr/jit/jitconfigvalues.h | Introduces the EnableApxPPX configuration integer. |
src/coreclr/jit/instrsxarch.h | Defines push2 and pop2 instructions with APX-related flags. |
src/coreclr/jit/instr.h | Adds new insOpts related to APX PPX hint. |
src/coreclr/jit/instr.cpp | Updates instruction display and generation routines to handle new APX/PPX context. |
src/coreclr/jit/emitxarch.h & .cpp | Adds helper functions for setting APX PPX context during instruction emission. |
src/coreclr/jit/emit.h & .cpp | Adds new member functions and the HasApxPpx() helper for APX PPX functionality. |
src/coreclr/jit/compiler.h | Declares new unwindPush2 and unwindPush2Windows functions. |
src/coreclr/jit/codegenxarch.cpp | Integrates push2/pop2 emission for unit tests and updates prolog/epilog register handling. |
src/coreclr/jit/codegen.h | Declares the instGen_Push2Pop2Ppx function and APX-specific pop routines. |
src/coreclr/jit/unwindx86.cpp
Outdated
@@ -50,6 +50,10 @@ void Compiler::unwindPush(regNumber reg) | |||
{ | |||
} | |||
|
|||
void Compiler::unwindPush2(regNumber reg1, regNumber reg2) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of unwindPush2 in unwindx86.cpp is empty; consider simulating push2 by calling unwindPush() for each register to match the intent stated in the PR description.
{ | |
{ | |
unwindPush(reg1); | |
unwindPush(reg2); |
Copilot uses AI. Check for mistakes.
@kunalspathak @EgorBo This PR is ready for review |
6b2c687 is being scheduled for building and testingGIT: |
1 similar comment
6b2c687 is being scheduled for building and testingGIT: |
RISC-V Release-FX-QEMU: 261199 / 262260 (99.60%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
6b2c687 is being scheduled for building and testingGIT: |
1 similar comment
6b2c687 is being scheduled for building and testingGIT: |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283661 / 284755 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 307376 / 309101 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
{ | ||
switch (ins) | ||
{ | ||
case INS_push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why INS_push
and INS_pop
is also in this? I was expecting just push2/pop2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INS_push
and INS_pop
has a PPX
feature that's part of APX
PPX (push-pop acceleration) : A PPX hint that helps processor tracks these new instructions internally and fast-forwards register data between matching PUSH2 and POP2 instructions without going through memory. This is also applicable for PUSH/POP with REX2 encoding
@@ -3782,6 +3782,20 @@ const IS_INFO emitter::emitGetSchedInfo(insFormat insFmt) | |||
assert(!"Unsupported insFmt"); | |||
return IS_NONE; | |||
} | |||
|
|||
bool emitter::HasApxPpx(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add summary docs explaining what is the scenario in which this method is used and what inference we can make from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/unwindarm64.cpp
Outdated
@@ -230,6 +230,11 @@ void Compiler::unwindPush(regNumber reg) | |||
unreached(); // use one of the unwindSaveReg* functions instead. | |||
} | |||
|
|||
void Compiler::unwindPush2(regNumber reg1, regNumber reg2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of touching other platform files, can you just add unwindPush2
for TARGET_AMD64
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change
src/coreclr/jit/codegenxarch.cpp
Outdated
// This is not a funclet or an On-Stack Replacement. | ||
assert((compiler->funCurrentFunc()->funKind == FuncKind::FUNC_ROOT) && !compiler->opts.IsOSR()); | ||
// PUSH2 doesn't work for ESP. | ||
assert((rsPushRegs & RBM_SPBASE) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why this code is not in its own method genPushCalleeSavedRegistersFromMaskAPX
similar to genPopCalleeSavedRegistersFromMaskAPX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a genPushCalleeSavedRegistersFromMaskAPX
and moved code there
} | ||
|
||
int index = 0; | ||
if (regStack.Height() % 2 == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer is NO, but just want to double check -
does it matter (mainly perfwise) if you should use push2/pop2 for same set of registers or is it ok to use different pattern?
push2 rax, rbx
push rcx
...
..
pop2 rcx, rbx
pop rax
instead of
push2 rax, rbx
push rcx
...
..
pop rcx
pop rbx, rax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it does matter
- The data being pushed/popped by PUSH2/POP2 must be 16B-aligned on the stack. This is why I'm always emitting a single
push
before any 'push2' and a singlepop
before function end and after anypop2'.
alignReg` is used to determine which register is used for this - I'm using the PPX feature whenever I'm using
push2/pop2
. This is because the current guidance is to use push2/pop2 only with PPX
A PUSH and its corresponding POP may be marked with a 1-bit
Push-Pop Acceleration (PPX) hint to indicate that the POP reads the value written by the PUSH from the
stack. The processor tracks these marked instructions internally and fast-forwards register data between
matching PUSH and POP instructions, without going through memory or through the training loop of the
Fast Store Forwarding Predictor (FSFP).
When applying the PPX hint, the compiler needs to make sure that it always marks both the PUSH and its
matching POP (i.e., the POP which reads from the same stack memory address that the PUSH writes to).
This balancing rule naturally applies to PUSH/POP sequences in function prologs/epilogs, respectively. It
does not apply to standalone PUSH sequences, such as function argument pushes onto the stack. Such
sequences should not be marked with the PPX hint
src/coreclr/jit/emitxarch.h
Outdated
} | ||
else | ||
{ | ||
assert((instOptions & INS_OPTS_APX_ppx_MASK) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is always true because it is inside else
of the opposite condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed else
@@ -20309,6 +20341,15 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins | |||
} | |||
break; | |||
|
|||
case INS_push2: | |||
// TODO-XArch-APX: to be verified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we verify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked internally and this is the guidance I received. https://uops.info/table.html doesn't have these new instructions yet. That's why I marked it as a ToDo
// id - instruction descriptor | ||
// instOptions - emit options | ||
// | ||
void SetApxPpxIfNeeded(instrDesc* id, insOpts instOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(HasApxPpx(id->idIns()))
asserts id it's not any of the 4 instructions. Did you mean a difference check?
// Setting EVEX.W = 1 bit indicates a push-pop acceleration (PPX) hint | ||
// The current recommendation is to use PUSH2/POP2 only with PPX hint | ||
// So, it is used only in Epilog/Prolog code generation | ||
if (id->idIsApxPpxContextSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it always true for push2
and pop2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push2
/pop2
are new instructions. PPX
is a feature which push'/
pop/
pop2/
push2can use. So theoretically, we might have a
push2/
pop2without
PPX enabled. We currently only use push2
/pop2
with PPX
since that's the guidance(the guidance is for performance reasons) but that doesn't mean push2
/pop2
cannot be used without PPX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
Superpmi result with/without PPX feature
Did you get a chance to check if all the perfscore is coming just from prolog and epilog? Also, is it because we have not setup the perf latency/throughput numbers accurately (I see a TODO there and added a comment about it) and hence perfscore is just because of reduced in number of instructions?
I did check and it's entirely coming from The guidance I got was |
PR Overview
This PR does the following
APX and PPX
As part of Intel APX(Intel Advanced Performance Extensions), a couple of new features are available for working with stack
This write up with be focused on push2/pop2.
PUSH2/POP2
PUSH2 and POP2 are two new instructions for (respectively) pushing/popping 2 GPRs at a time to/from
the stack. These instructions use eEVEX encoding. The data being pushed/popped by PUSH2/POP2 must be 16B-aligned on the stack.
Guidance from Intel
It’s not part of the spec but in current implementations, push2/pop2 should really only be used with PPX hints and thus should only be used in matching “pairs”. i.e.
Unwind code
Windows does not current support unwind for push2. After discussion with Kunal, I decided to use 2
unwwind_push()
to simulatepush2
. This will need to be updated later once we have supportTesting done
Superpmi result with/without PPX feature
Diffs are based on 2,602,472 contexts (1,012,864 MinOpts, 1,589,608 FullOpts).
MISSED contexts: 17 (0.00%)
Base JIT options: JitBypassApxCheck=1
Diff JIT options: EnableApxPPX=1;JitBypassApxCheck=1
Overall (+15,581,843 bytes)
MinOpts (+527,206 bytes)
FullOpts (+15,054,637 bytes)
Sample diff
-6 (-17.65%) : 17249.dasm - System.Globalization.CalendarData:LoadCalendarDataFromSystemCore(System.String,ushort):bool:this (FullOpts)
-6 (-16.67%) : 7885.dasm - Microsoft.Extensions.Logging.LoggerMessage+<>c__DisplayClass12_0`2[int,System.__Canon]:b__1(Microsoft.Extensions.Logging.ILogger,int,System.__Canon,System.Exception):this (FullOpts)
-9 (-16.67%) : 13852.dasm - System.Linq.Enumerable+ArrayWhereSelectIterator`2[System.__Canon,System.__Canon]:GetCount(bool,System.ReadOnlySpan`1[System.__Canon],System.Func`2[System.__Canon,bool],System.Func`2[System.__Canon,System.__Canon]):int (FullOpts)