Skip to content

Implementation of ldftn and calli #116449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 12, 2025

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Jun 9, 2025

  • This implements ldftn by matching the implementation in the JIT. Integration with the delegate construction path is not done, so we always go down slow path delegate creation, but it does work.
  • This implementation leverages our CallStubGenerator to create stubs for calli instructions. To get the appropriate call stub to the right location, we use a new jit interface api called GetCookieForInterpreterCalliSig to pass the needed cookie around.
  • As a bonus it is now possible to call delegates by calling the Invoke method on the delegate. We will likely want to translate that to using an INTOP_CALLI in the future, but it does work for now.

NOTE: This logic results in going through an interpreter->native->interpreter calling thunks to make interpreter to interpreter calls. We will likely want to build an optimized path which uses of NonVirtualEntry2MethodDesc or an interpreter specific form to avoid bouncing through the calling convention trampolines, but I'd like to have a fully functional system before diving into doing that sort of thing as an optimization.

- This implements ldftn by matching the implementation in the JIT. It appears to work correctly with shared generics, but the Interpreter stub handling of the generic param argument is missing, so that doesn't work yet
- This implementation leverages our CallStubGenerator to create stubs for calli instructions. To get the appropriate call stub to the right location, we misuse the GetCookieForPInvokeCalliSig to pass the needed cookie around.
- As a bonus its now possible to call delegates which take no parameters.
@Copilot Copilot AI review requested due to automatic review settings June 9, 2025 22:24
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 9, 2025
@davidwrighton davidwrighton added area-CodeGen-Interpreter-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 9, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the ldftn and calli instruction support in the interpreter and JIT, along with associated test cases and improvements in call stub generation. Key changes include adding tests for delegate and calli execution in Interpreter.cs, modifications to call stub generation in CallStubGenerator for both method and signature-based stubs, and updates to interpreter opcode definitions and compilation to support calli functionality.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/tests/JIT/interpreter/Interpreter.cs Added tests for delegate and calli, and updated catch block structures
src/coreclr/vm/typehashingalgorithms.h Introduced an AddPointer method for pointer hashing
src/coreclr/vm/siginfo.hpp Added const qualifiers to NumFixedArgs and HasThis methods
src/coreclr/vm/jitinterface.{h,cpp} Added GetCookieForPInvokeCalliSig to support calli cookie propagation
src/coreclr/vm/interpexec.cpp Extended interpreter execution to handle INTOP_CALLI and ldptr.deref
src/coreclr/vm/ceemain.cpp Initialized CallStubGenerator for interpreter support
src/coreclr/vm/callstubgenerator.{h,cpp} Updated call stub generation logic and caching for calli support
src/coreclr/interpreter/{intops.def,compiler.{h,cpp}} Added new opcodes and modified EmitCall to handle calli cases
src/coreclr/inc/{crsttypes_generated.h,CrstTypes.def} Updated CRST enumerations to include the CallStubCache
Comments suppressed due to low confidence (1)

src/coreclr/interpreter/compiler.cpp:2309

  • [nitpick] The variable name 'callIFunctionPointerVar' could be renamed to 'calliFunctionPointerVar' to better reflect its purpose and maintain consistency with the 'isCalli' flag.
int callIFunctionPointerVar = m_pStackPointer[-1].var;

@@ -2239,6 +2239,32 @@ InterpCompiler::InterpEmbedGenericResult InterpCompiler::EmitGenericHandle(CORIN
return result;
}

void InterpCompiler::EmitCORINFO_LOOKUPToStack(const CORINFO_LOOKUP& lookup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question - what does the "ToStack" mean in the function name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, nothing good. In this case its stuffing its result onto the interpreter stack.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli How about renaming it to EmitPushCORINFO_LOOKUP, does that sound good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

// plus one slot for the target pointer and reallocated to the real size at the end.
PCODE *pRoutines = (PCODE*)alloca(ComputeTempStorageSize(sig));

ComputeCallStub(sig, pRoutines);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use the signature data to generate the hash instead of building the call stub and then throwing it away when we find it in the cache? It seems it defeats the purpose of caching the stub - we could then just generate it every time again and the perf would be the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store exists to handle the storage problem not the cost of calculating the callstub problem. Computing the callstub is actually very fast, but managing the storage is what we're doing here. Really I shouldn't call this a cache, but callings things like this caches is extremely prevalent in our codebase.

Comment on lines 3207 to 3208
// Generate a cookie based on the signature that would needs to be passed
// to INTOP_CALLI in the interpreter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Generate a cookie based on the signature that would needs to be passed
// to INTOP_CALLI in the interpreter
// Generate a cookie from the signature to pass to INTOP_CALLI in the interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, change fix the comment on GetCookieForPInvokeCalliSig.)

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

…ubGenerator to be initialized to some well known value (and by explicitly setting m_interpreterToNative to true in GenerateCallStubForSig (which is what actually fixed the problem)
CONTRACTL_END

// CallStubHeaders encode their destination addresses in the Routines array, so they need to be
// copied to a local buffer before we can actually set their target address.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, call stubs shouldn't include the target pointer to call. They could be tied to the signature only and we could cache them for interp->native calls as well, instead of creating a new stub for every method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having the target in the routines list would mean that we would have an unnecessary extra return and call, because we would need to put a routine with just ret to the end of the routines list and then call the target from one of the CallJittedMethodRetXXX functions. Also, having it per method allows us to not to have an extra storage for the call stubs pointers and a lookup based on the signature. I am not sure if the benefit of having less of the call stubs has enough weight, especially when these stubs are created only for the cases when the interpreter code calls native code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works ok for Apple platforms.

Once we get to wasm, these call stubs will need to pre-compiled and we will want to share them per signature to minimize binary size. I believe it is how Mono works today. @BrzVlad Is that correct? (An alternative is to make all managed methods in wasm to have a uniform signature - but that comes with a different set of problems, like less than ideal perf.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we only had a variety of different stubs in Mono on wasm to support calling native functions, and the managed to managed function interface was more of the uniform signature model. Honestly, I'd be tempted to use the interpreter calling convention entirely, including the parallel stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://github.com/dotnet/runtimelab/blob/feature/CoreclrInterpreter/docs/design/interpreter/compiled-code-interop.md#interpreter-exit-1 :

When the application is AOT compiled, we will include a compiled wrapper for every signature of a compiled method as well as for every pinvoke signature. The wrapper will receive the target pointer to call and the address of the interpreter stack where the arguments are present. On mono these wrappers are written in C with dynamically generated code during app compilation time.

Honestly, I'd be tempted to use the interpreter calling convention entirely, including the parallel stack.

This should be data driven decision. If we were to do this, w should try to quantify the perf loss before we build the whole system around it.

Copy link
Member

@BrzVlad BrzVlad Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, on mono wasm we share transition wrappers per signature and we would probably have to do the same on CoreCLR. We already have to spill refs on the stack (in mono wasm aot), so the GC can detect references and pin the objects. Maybe having this spilling also as part of the call convention is not that bad.

Transitions on mono wasm are kind of messy, hoping I'm not making any mistakes but a rough overview of how they work is the following:

  • interp -> compiled. When compiling aot image we should include compiled IL wrappers for all signatures of compiled methods. This transitions will invoke this compiled wrapper, passing the target IP as an argument.
  • interp -> pinvoke. When building app, we traverse over all pinvoke/icall signatures and generate a C thunk that does the transition. We invoke this passing the target IP as an argument.
  • compiled -> interp. When compiling aot image, when compiling a call, we keep track of possible signatures that might serve as interpreter entry point. We then include per signature wrapper for interpreter entry. Compiled code will call another method through a special ftndesc structure. In the case of interpreter code, this will be a pair of the InterpMethod to execute and some wrapper to be used.
  • pinvoke -> interp. When building app, we traverse over all unmanaged callers only. We generate a C thunk per method to be invoked. There is no signature sharing here.

if (EmitCallIntrinsics(callInfo.hMethod, callInfo.sig))
callIFunctionPointerVar = m_pStackPointer[-1].var;
m_pStackPointer--;
calliCookie = m_compHnd->GetCookieForInterpreterCalliSig(&callInfo.sig);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like adding a new method to the jit interface API is quite invasive. I'm wondering if it is even needed. Long term, I would imagine the result of ldftn is either a compiled ftnptr or a tagged pointer to some interp information. In the case of interpreter invocation, the call stub wouldn't even be used.

I'm wondering whether it might be easier to not compute anything during compile time and when doing the CALLI, in the call to compiled code case, from the code pointer we would obtain the EECodeInfo then the MethodDesc and we would then create the CallStub and store it in some data slot for the CALLI opcode instead. Thus the call stub would be lazily generated during first invocation. Not entirely sure whether this EECodeInfo stuff works with the interpreter precodes though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really annoying to add things to the jit interface, but it's not an unreasonable thing to do. One detail you probably didn't know is that most of the changes are auto-generated after editing the ThunkInput.txt file, and then running the gen script which is next to it. The only actually tricky thing to do is to handle the subset of SPMI changes which are not auto-generated, which for interpreter only changes is technically optional (although I have hopes that once we get around to having a stable implementation of the interpreter, we can use the SPMI infrastructure as we implement optimizations in the interpreter.) In general, I'm not convinced that lazy updating bits as the interpreter executes is a good plan for anything other than dynamic optimization opportunities (such as a monomorphic cache of MethodDesc to execution strategy or something.)

key.token = (DWORD)szMetaSig->token;

DLDL value;
value.A = CastPointer(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are not using the second field of the DLDL then the value can just be a DWORDLONG.

@davidwrighton davidwrighton enabled auto-merge (squash) June 12, 2025 20:39
Copy link
Contributor

Tagging @dotnet/jit-contrib for JIT-EE GUID update

@davidwrighton davidwrighton merged commit e713f13 into dotnet:main Jun 12, 2025
105 checks passed
@janvorli janvorli mentioned this pull request Jun 19, 2025
61 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants