Skip to content

Commit b10ffa2

Browse files
.NET 10 P5 Runtime (#9912)
* Update for Runtime * Add JIT notes * Fix GC notes --------- Co-authored-by: Aman Khalid (from Dev Box) <[email protected]>
1 parent a954661 commit b10ffa2

File tree

1 file changed

+99
-2
lines changed

1 file changed

+99
-2
lines changed

release-notes/10.0/preview/preview5/runtime.md

Lines changed: 99 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,103 @@ Here's a summary of what's new in the .NET Runtime in this preview release:
88

99
- [What's new in .NET 10](https://learn.microsoft.com/dotnet/core/whats-new/dotnet-10/overview) documentation
1010

11-
## Feature
11+
## Escape Analysis for Delegates
1212

13-
Something about the feature
13+
Preview 5 extends the JIT compiler's escape analysis implementation to model delegate invokes. When compiling source code to IL, each delegate is transformed into a closure class with a method corresponding to the delegate's definition, and fields matching any captured variables. At runtime, a closure object is created to instantiate the captured variables, along with a `Func` object to invoke the delegate. Thanks to [dotnet/runtime #115172](https://github.com/dotnet/runtime/pull/115172), if escape analysis determines the `Func` object will not outlive its current scope, the JIT will allocate it on the stack.
14+
15+
Consider the following example:
16+
17+
```csharp
18+
public static int Main()
19+
{
20+
int local = 1;
21+
int[] arr = new int[100];
22+
var func = (int x) => x + local;
23+
int sum = 0;
24+
25+
foreach (int num in arr)
26+
{
27+
sum += func(num);
28+
}
29+
30+
return sum;
31+
}
32+
```
33+
34+
Here is the abbreviated x64 assembly the .NET 9 JIT produces for `Main`:
35+
36+
```asm
37+
; prolog omitted for brevity
38+
mov rdi, 0x7DD0AE362E28 ; Program+<>c__DisplayClass0_0
39+
call CORINFO_HELP_NEWSFAST
40+
mov rbx, rax
41+
mov dword ptr [rbx+0x08], 1
42+
mov rdi, 0x7DD0AE268A98 ; int[]
43+
mov esi, 100
44+
call CORINFO_HELP_NEWARR_1_VC
45+
mov r15, rax
46+
mov rdi, 0x7DD0AE4A9C58 ; System.Func`2[int,int]
47+
call CORINFO_HELP_NEWSFAST
48+
mov r14, rax
49+
lea rdi, bword ptr [r14+0x08]
50+
mov rsi, rbx
51+
call CORINFO_HELP_ASSIGN_REF
52+
mov rsi, 0x7DD0AE461140 ; code for Program+<>c__DisplayClass0_0:<Main>b__0(int):int:this
53+
mov qword ptr [r14+0x18], rsi
54+
xor ebx, ebx
55+
add r15, 16
56+
mov r13d, 100
57+
58+
G_M24375_IG03: ;; offset=0x0075
59+
mov esi, dword ptr [r15]
60+
mov rdi, gword ptr [r14+0x08]
61+
call [r14+0x18]System.Func`2[int,int]:Invoke(int):int:this
62+
add ebx, eax
63+
add r15, 4
64+
dec r13d
65+
jne SHORT G_M24375_IG03
66+
; epilog omitted for brevity
67+
```
68+
69+
Before entering the loop, `arr`, `func`, and the closure class for `func` called `Program+<>c__DisplayClass0_0` are all allocated on the heap, as indicated by the `CORINFO_HELP_NEW*` calls. Preview 1 introduced stack allocation for fixed-sized arrays of value types, so we can expect the heap allocation for `arr` to go away. Because `func` is never referenced outside the scope of `Main`, it will also be allocated on the stack. Here is the code generated in Preview 5:
70+
71+
```asm
72+
; prolog omitted for brevity
73+
mov rdi, 0x7B52F7837958 ; Program+<>c__DisplayClass0_0
74+
call CORINFO_HELP_NEWSFAST
75+
mov rbx, rax
76+
mov dword ptr [rbx+0x08], 1
77+
mov rsi, 0x7B52F7718CC8 ; int[]
78+
mov qword ptr [rbp-0x1C0], rsi
79+
lea rsi, [rbp-0x1C0]
80+
mov dword ptr [rsi+0x08], 100
81+
lea r15, [rbp-0x1C0]
82+
xor r14d, r14d
83+
add r15, 16
84+
mov r13d, 100
85+
86+
G_M24375_IG03: ;; offset=0x0099
87+
mov esi, dword ptr [r15]
88+
mov rdi, rbx
89+
mov rax, 0x7B52F7901638 ; address of definition for "func"
90+
call rax
91+
add r14d, eax
92+
add r15, 4
93+
dec r13d
94+
jne SHORT G_M24375_IG03
95+
; epilog omitted for brevity
96+
```
97+
98+
Notice there is only one `CORINFO_HELP_NEW*` call remaining: the heap allocation for the closure. We plan to expand escape analysis to support stack allocation of closures in a future release.
99+
100+
## Inlining Improvements
101+
102+
Preview 5 further enhances the JIT's inlining policy to take better advantage of profile data. Among numerous heuristics, the JIT's inliner will not consider methods over a certain size to avoid bloating the caller method. When the caller has profile data that suggests an inlining candidate is frequently executed, the inliner increases its size tolerance for the candidate.
103+
104+
Suppose the JIT inlines some callee `Bar` without profile data into some caller `Foo` with profile data -- this discrepancy can happen if the callee is too small to be worth instrumenting, or if it is inlined too often to have a sufficient call count. If `Bar` has its own inlining candidates, the JIT would previously consider them with its default size limit due to `Bar` lacking profile data. With [dotnet/runtime #115119](https://github.com/dotnet/runtime/pull/115119), the JIT will now realize `Foo` has profile data, and loosen its size restriction (but not to the same degree as if `Bar` has profile data, to account for loss of precision). Like the inlining policy changes in Preview 4, this brought [hundreds](https://github.com/dotnet/runtime/pull/115119#issuecomment-2914325071) of improvements to our microbenchmarks.
105+
106+
## ARM64 Write Barrier Improvements
107+
108+
The .NET garbage collector (GC) is generational, meaning it separates live objects by age to improve collection performance. The GC collects younger generations more often under the assumption that long-lived objects are less likely to be unreferenced (or "dead") at any given time. However, suppose an old object starts referencing a young object; the GC needs to know it cannot collect the young object, but needing to scan older objects to collect a young object defeats the performance gains of a generational GC.
109+
110+
To solve this problem, the JIT inserts write barriers before object reference updates to keep the GC informed. On x64, the runtime can dynamically switch between write barrier implementations to balance write speeds and collection efficiency, depending on the GC's configuration. [dotnet/runtime #111636](https://github.com/dotnet/runtime/pull/111636) brings this functionality to ARM64 as well. In particular, the new default write barrier implementation on ARM64 handles GC regions more precisely, improving collection performance at slight expense to write barrier throughput: Our benchmarks show GC pause improvements from eight to over twenty percent with the new GC defaults.

0 commit comments

Comments
 (0)