-
Notifications
You must be signed in to change notification settings - Fork 5k
Stack Allocation Enhancements #104936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
For the delegate case, if we add index 89e28c5978c..a069d98503e 100644
--- a/src/coreclr/jit/objectalloc.cpp
+++ b/src/coreclr/jit/objectalloc.cpp
@@ -719,12 +719,23 @@ bool ObjectAllocator::CanLclVarEscapeViaParentStack(ArrayStack<GenTree*>* parent
case GT_CALL:
{
- GenTreeCall* asCall = parent->AsCall();
+ GenTreeCall* const call = parent->AsCall();
- if (asCall->IsHelperCall())
+ if (call->IsHelperCall())
{
canLclVarEscapeViaParentStack =
- !Compiler::s_helperCallProperties.IsNoEscape(comp->eeGetHelperNum(asCall->gtCallMethHnd));
+ !Compiler::s_helperCallProperties.IsNoEscape(comp->eeGetHelperNum(call->gtCallMethHnd));
+ }
+ else if (call->gtCallType == CT_USER_FUNC)
+ {
+ // Delegate invoke won't escape the delegate which is passed as "this"
+ // And gets expanded inline later.
+ //
+ if ((call->gtCallMoreFlags & GTF_CALL_M_DELEGATE_INV) != 0)
+ {
+ GenTree* const thisArg = call->gtArgs.GetThisArg()->GetNode();
+ canLclVarEscapeViaParentStack = thisArg != tree;
+ } Then the example above becomes ; Method Y:Test():int (FullOpts)
G_M53607_IG01: ;; offset=0x0000
sub rsp, 88
vxorps xmm4, xmm4, xmm4
vmovdqu ymmword ptr [rsp+0x20], ymm4
vmovdqa xmmword ptr [rsp+0x40], xmm4
xor eax, eax
mov qword ptr [rsp+0x50], rax
;; size=27 bbWeight=1 PerfScore 5.83
G_M53607_IG02: ;; offset=0x001B
mov rcx, 0x7FFD4BB04580 ; Y+<>c__DisplayClass0_0
call CORINFO_HELP_NEWSFAST
mov dword ptr [rax+0x08], 100
mov rcx, 0x7FFD4BB04B30 ; System.Func`1[int]
mov qword ptr [rsp+0x20], rcx
mov gword ptr [rsp+0x28], rax
mov rcx, 0x7FFD4B8783F0 ; code for Y+<>c__DisplayClass0_0:<Test>b__0():int:this
mov qword ptr [rsp+0x38], rcx
lea rax, [rsp+0x20]
mov rcx, gword ptr [rax+0x08]
call [rax+0x18]System.Func`1[int]:Invoke():int:this
nop
;; size=70 bbWeight=1 PerfScore 11.50
G_M53607_IG03: ;; offset=0x0061
add rsp, 88
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 102 where the closure is still on the heap and we're invoking the delegate func "directly" but via a convoluted path where we store the func's (indirection cell) address to the stack allocated delegate and then fetch it back and indirect through it. Ideally we'd like to be able to inline and perhaps realize the closure doesn't escape either, but that seems far off. Perhaps we can just summarily claim the closure can't escape. I am not sure. Moving delegate invoke expansion earlier does not look to be simple -- currently there is some prep work in morph and then the actual expansion in lower, and tail calls are a complication. |
@AndyAyersMS With the array (non-gc elems) support + my field analysis prototype + the above delegate handling (branch at https://github.com/hez2010/runtime/tree/field-stackalloc), the codegen becomes: G_M30166_IG01: ;; offset=0x0000
sub rsp, 104
vxorps xmm4, xmm4, xmm4
vmovdqu ymmword ptr [rsp+0x20], ymm4
vmovdqa xmmword ptr [rsp+0x40], xmm4
xor eax, eax
mov qword ptr [rsp+0x50], rax
;; size=27 bbWeight=1 PerfScore 5.83
G_M30166_IG02: ;; offset=0x001B
xor ecx, ecx
mov qword ptr [rsp+0x58], rcx
mov dword ptr [rsp+0x60], ecx
mov rcx, 0x7FFEBB4211B8 ; Y+<>c__DisplayClass7_0
mov qword ptr [rsp+0x58], rcx
mov dword ptr [rsp+0x60], 100
mov rcx, 0x7FFEBB420EE8 ; System.Func`1[int]
mov qword ptr [rsp+0x20], rcx
lea rcx, [rsp+0x58]
mov qword ptr [rsp+0x28], rcx
mov rcx, 0x7FFEBB3F0678 ; code for Y+<>c__DisplayClass7_0:<Test>b__0():int:this
mov qword ptr [rsp+0x38], rcx
lea rax, [rsp+0x20]
mov rcx, gword ptr [rax+0x08]
call [rax+0x18]System.Func`1[int]:Invoke():int:this
nop
;; size=87 bbWeight=1 PerfScore 14.25
G_M30166_IG03: ;; offset=0x0072
add rsp, 104
ret |
Added |
Put more useful links: An overview of the impl of esacpe analysis including interprocedural analysis in JVM: https://cr.openjdk.org/~cslucas/escape-analysis/EscapeAnalysis.html Some insights on allocating objects in a loop and etc.: https://devblogs.microsoft.com/java/improving-openjdk-scalar-replacement-part-2-3/ |
Implement a very simplistic "field sensitive" analysis for `Span<T>` where we model the span as simply its byref field. If the span is only consumed locally, and the array does not otherwise escape, then the array does not escape. This is a subset of dotnet#112543 that does not try and reason interprocedurally. Contributes to dotnet#104936 / dotnet#108913
Continuation of dotnet#113141 Implement a very simplistic "field sensitive" analysis for gc structs where we model each struct as simply its gc field(s). That is, given a gc struct `G` with GC field `f`, we model ``` G.f = ... ``` as an assignment to `G` and ``` = G.f ``` as a read from `G`. Since we know `G` itself cannot escape, "escaping" of `G` means `G.f` can escape. Note this conflates the fields of `G`: either they all escape or none of them do. We could make this more granular but it's not clear that doing so is worthwhile, and it requires more up-front work to determine how to track each field's status. If the struct or struct fields are only consumed locally, we may be able to prove the gc referents don't escape. This is a subset/elaboration of dotnet#112543 that does not try and reason interprocedurally. Contributes to dotnet#104936 / dotnet#108913 Fixes dotnet#113236 (once the right inlines happen)
Continuation of dotnet#113141 Implement a very simplistic "field sensitive" analysis for gc structs where we model each struct as simply its gc field(s). That is, given a gc struct `G` with GC field `f`, we model ``` G.f = ... ``` as an assignment to `G` and ``` = G.f ``` as a read from `G`. Since we know `G` itself cannot escape, "escaping" of `G` means `G.f` can escape. Note this conflates the fields of `G`: either they all escape or none of them do. We could make this more granular but it's not clear that doing so is worthwhile, and it requires more up-front work to determine how to track each field's status. If the struct or struct fields are only consumed locally, we may be able to prove the gc referents don't escape. This is a subset/elaboration of dotnet#112543 that does not try and reason interprocedurally. Contributes to dotnet#104936 / dotnet#108913 Fixes dotnet#113236 (once the right inlines happen)
Abstractly a non-GC local can't cause escape. On 32 bit platforms tracking connections to `TYP_I_IMPL` leads to confusion once we start struct field tracking, as we logically "overlay" GC and non-GC fields. Contributes to dotnet#104936
Abstractly a non-GC local can't cause escape. On 32 bit platforms tracking connections to `TYP_I_IMPL` leads to confusion once we start struct field tracking, as we logically "overlay" GC and non-GC fields. Contributes to #104936
Continuation of #113141 Implement a very simplistic "field sensitive" analysis for gc structs where we model each struct as simply its gc field(s). That is, given a gc struct `G` with GC field `f`, we model ``` G.f = ... ``` as an assignment to `G` and ``` = G.f ``` as a read from `G`. Since we know `G` itself cannot escape, "escaping" of `G` means `G.f` can escape. Note this conflates the fields of `G`: either they all escape or none of them do. We could make this more granular but it's not clear that doing so is worthwhile, and it requires more up-front work to determine how to track each field's status. If the struct or struct fields are only consumed locally, we may be able to prove the gc referents don't escape. This is a subset/elaboration of #112543 that does not try and reason interprocedurally. Contributes to #104936 / #108913 Fixes #113236 (once the right inlines happen)
Mark delegate invoke calls as non-escaping for `this`. Later, in the stack allocation code rewriting post-pass, if the `this` at a delegate invoke is definitely a stack pointing local, expand the delegate invoke so that physical promotion can see all the accesses. Contributes to dotnet#104936.
Mark delegate invoke calls as non-escaping for `this`. Later, in the stack allocation code rewriting post-pass, if the `this` at a delegate invoke is definitely a stack pointing local, expand the delegate invoke so that physical promotion can see all the accesses. Contributes to #104936.
Stack allocation of non-escaping ref classes and boxed value classes was enabled in #103361, but only works in limited cases. This issue tracks further enhancements (see also #11192).
Abilities:
See note below.
localloc
instead of fixed allocations (at least for non-gc types; for GC types there's currently no way to do proper GC reporting)Analysis:
List<>
and[]
#111838Span
capture to give us "cheap" interprocedural analysisSpan<T>
capture #112543Implementation:
ALLOCOBJ
ALLOCOBJ
assigned to single-def temp in importerClassLayout
instances with GC refs #103362.stackalloc
it also inhibits implicit tail calls.NAOT:
Advanced:
Diagnostics:
FYI @dotnet/jit-contrib
The text was updated successfully, but these errors were encountered: