Description
Description
I have a tiny function Op2
that takes a single generic struct argument that is called in an hot loop. JIT refuses to inline this function. However, if I make the argument non-generic, the function is inlined.
This happens even though the function is marked with the [MethodImpl(MethodImplOptions.AggressiveInlining)]
attribute.
Reproduction Steps
The code (Also on compiler explorer with BenchmarkDotNet removed https://csharp.godbolt.org/z/Gxej17TsP):
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace umbee_cli;
[DisassemblyDiagnoser(printSource: true)]
public class Test
{
private int sum = 0;
record struct Iu1();
record struct Iu2();
interface IVariables
{
public int Get<T>(T v);
public void Set<T>(T v, int i);
}
struct Variables : IVariables
{
private int iu1;
private int iu2;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public int Get<T>(T v)
{
switch (v)
{
case Iu1 _: return iu1;
case Iu2 _: return iu2;
}
return 0;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void Set<T>(T v, int i)
{
switch (v)
{
case Iu1 _: iu1 = i; break;
case Iu2 _: iu2 = i; break;
}
}
}
[Benchmark]
public void ScanProduce()
{
Variables variables = new Variables();
for (int i = 0; i < 100; i++)
{
variables.Set(new Iu1(), i);
Op2(variables);
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
void Op2<TVar>(TVar variables) where TVar : struct, IVariables
{
sum += variables.Get(new Iu1());
}
}
class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<Test>();
}
}
The ASM:
.NET 10.0.0 (10.0.25.27814), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
; umbee_cli.Test.ScanProduce()
; Variables variables = new Variables();
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
; for (int i = 0; i < 100; i++)
; ^^^^^^^^^
; variables.Set(new Iu1(), i);
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
; Op2(variables);
; ^^^^^^^^^^^^^^^
push rsi
push rbx
sub rsp,28
mov rbx,rcx
xor esi,esi
M00_L00:
mov [rsp+20],esi
xor edx,edx
mov [rsp+24],edx
mov rdx,[rsp+20]
mov rcx,rbx
call qword ptr [7FFCD27EF150]; umbee_cli.Test.Op2[[umbee_cli.Test+Variables, umbee-cli]](Variables)
inc esi
cmp esi,64
jl short M00_L00
add rsp,28
pop rbx
pop rsi
ret
; Total bytes of code 49
; umbee_cli.Test.Op2[[umbee_cli.Test+Variables, umbee-cli]](Variables)
; sum += variables.Get(new Iu1());
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
add [rcx+8],edx
ret
; Total bytes of code 4
Although Op2 is extremely tiny it is not inlined. This happens even when I pass the struct by ref
.
Expected behavior
I would expect the JIT to inline Op2 within ScanProduce.
Here is the ASM when I change the signature of Op2 to void Op2(Variables variables)
:
.NET 10.0.0 (10.0.25.27814), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
; umbee_cli.Test.ScanProduce()
; Variables variables = new Variables();
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
; for (int i = 0; i < 100; i++)
; ^^^^^^^^^
; variables.Set(new Iu1(), i);
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
; Op2(variables);
; ^^^^^^^^^^^^^^^
xor eax,eax
M00_L00:
add [rcx+8],eax
inc eax
cmp eax,64
jl short M00_L00
ret
; Total bytes of code 13
Actual behavior
JIT does not inline Op2.
Regression?
No response
Known Workarounds
No response
Configuration
.NET 10.0.0 (10.0.25.27814), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Windows 11
The release version of .NET 9 also has this issue. So does not seem to be a recent regression.
Other information
Some context: I am trying to build an efficient data pipeline in C#. My goal is to implement operators using generics and instantiate this generics in runtime given a user query.
For example, for the following query:
select sum(x)
from generate_series(0, 99) t(x);
I have ScanProduce
that produces the values between 0 and 99 and Op1
which adds up the values given by ScanProduce
in a result.
I need a way of setting and getting values such that these operators can pass information amongst themselves based on the user-defined query. The hope would be that the JIT is capable of inlining everything avoiding copies and spills.
The rough sketch of the processing would be:
Query -> Prepare struct Variables
with reflection -> Generic instantiation -> JIT -> Very efficient code for executing query
I hope to not have to use reflection within the implementation of the operators and rely on generics as much as possible, to be able to use the debugger effectively with breakpoints etc when debugging operators.