-
Notifications
You must be signed in to change notification settings - Fork 5k
Implementation of ldftn and calli #116449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of ldftn and calli #116449
Conversation
- This implements ldftn by matching the implementation in the JIT. It appears to work correctly with shared generics, but the Interpreter stub handling of the generic param argument is missing, so that doesn't work yet - This implementation leverages our CallStubGenerator to create stubs for calli instructions. To get the appropriate call stub to the right location, we misuse the GetCookieForPInvokeCalliSig to pass the needed cookie around. - As a bonus its now possible to call delegates which take no parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the ldftn and calli instruction support in the interpreter and JIT, along with associated test cases and improvements in call stub generation. Key changes include adding tests for delegate and calli execution in Interpreter.cs, modifications to call stub generation in CallStubGenerator for both method and signature-based stubs, and updates to interpreter opcode definitions and compilation to support calli functionality.
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
src/tests/JIT/interpreter/Interpreter.cs | Added tests for delegate and calli, and updated catch block structures |
src/coreclr/vm/typehashingalgorithms.h | Introduced an AddPointer method for pointer hashing |
src/coreclr/vm/siginfo.hpp | Added const qualifiers to NumFixedArgs and HasThis methods |
src/coreclr/vm/jitinterface.{h,cpp} | Added GetCookieForPInvokeCalliSig to support calli cookie propagation |
src/coreclr/vm/interpexec.cpp | Extended interpreter execution to handle INTOP_CALLI and ldptr.deref |
src/coreclr/vm/ceemain.cpp | Initialized CallStubGenerator for interpreter support |
src/coreclr/vm/callstubgenerator.{h,cpp} | Updated call stub generation logic and caching for calli support |
src/coreclr/interpreter/{intops.def,compiler.{h,cpp}} | Added new opcodes and modified EmitCall to handle calli cases |
src/coreclr/inc/{crsttypes_generated.h,CrstTypes.def} | Updated CRST enumerations to include the CallStubCache |
Comments suppressed due to low confidence (1)
src/coreclr/interpreter/compiler.cpp:2309
- [nitpick] The variable name 'callIFunctionPointerVar' could be renamed to 'calliFunctionPointerVar' to better reflect its purpose and maintain consistency with the 'isCalli' flag.
int callIFunctionPointerVar = m_pStackPointer[-1].var;
…erface. It isn't used, but it should be there.
…eter_calli_support
…eter_calli_support
Co-authored-by: Copilot <[email protected]>
…using the pinvoke one.
…righton/runtime into interpreter_calli_support
src/coreclr/interpreter/compiler.cpp
Outdated
@@ -2239,6 +2239,32 @@ InterpCompiler::InterpEmbedGenericResult InterpCompiler::EmitGenericHandle(CORIN | |||
return result; | |||
} | |||
|
|||
void InterpCompiler::EmitCORINFO_LOOKUPToStack(const CORINFO_LOOKUP& lookup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question - what does the "ToStack" mean in the function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, nothing good. In this case its stuffing its result onto the interpreter stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli How about renaming it to EmitPushCORINFO_LOOKUP
, does that sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
// plus one slot for the target pointer and reallocated to the real size at the end. | ||
PCODE *pRoutines = (PCODE*)alloca(ComputeTempStorageSize(sig)); | ||
|
||
ComputeCallStub(sig, pRoutines); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we use the signature data to generate the hash instead of building the call stub and then throwing it away when we find it in the cache? It seems it defeats the purpose of caching the stub - we could then just generate it every time again and the perf would be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The store exists to handle the storage problem not the cost of calculating the callstub problem. Computing the callstub is actually very fast, but managing the storage is what we're doing here. Really I shouldn't call this a cache, but callings things like this caches is extremely prevalent in our codebase.
src/coreclr/inc/corinfo.h
Outdated
// Generate a cookie based on the signature that would needs to be passed | ||
// to INTOP_CALLI in the interpreter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Generate a cookie based on the signature that would needs to be passed | |
// to INTOP_CALLI in the interpreter | |
// Generate a cookie from the signature to pass to INTOP_CALLI in the interpreter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, change fix the comment on GetCookieForPInvokeCalliSig.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
…ubGenerator to be initialized to some well known value (and by explicitly setting m_interpreterToNative to true in GenerateCallStubForSig (which is what actually fixed the problem)
CONTRACTL_END | ||
|
||
// CallStubHeaders encode their destination addresses in the Routines array, so they need to be | ||
// copied to a local buffer before we can actually set their target address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, call stubs shouldn't include the target pointer to call. They could be tied to the signature only and we could cache them for interp->native calls as well, instead of creating a new stub for every method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having the target in the routines list would mean that we would have an unnecessary extra return and call, because we would need to put a routine with just ret to the end of the routines list and then call the target from one of the CallJittedMethodRetXXX
functions. Also, having it per method allows us to not to have an extra storage for the call stubs pointers and a lookup based on the signature. I am not sure if the benefit of having less of the call stubs has enough weight, especially when these stubs are created only for the cases when the interpreter code calls native code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works ok for Apple platforms.
Once we get to wasm, these call stubs will need to pre-compiled and we will want to share them per signature to minimize binary size. I believe it is how Mono works today. @BrzVlad Is that correct? (An alternative is to make all managed methods in wasm to have a uniform signature - but that comes with a different set of problems, like less than ideal perf.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we only had a variety of different stubs in Mono on wasm to support calling native functions, and the managed to managed function interface was more of the uniform signature model. Honestly, I'd be tempted to use the interpreter calling convention entirely, including the parallel stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the application is AOT compiled, we will include a compiled wrapper for every signature of a compiled method as well as for every pinvoke signature. The wrapper will receive the target pointer to call and the address of the interpreter stack where the arguments are present. On mono these wrappers are written in C with dynamically generated code during app compilation time.
Honestly, I'd be tempted to use the interpreter calling convention entirely, including the parallel stack.
This should be data driven decision. If we were to do this, w should try to quantify the perf loss before we build the whole system around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, on mono wasm we share transition wrappers per signature and we would probably have to do the same on CoreCLR. We already have to spill refs on the stack (in mono wasm aot), so the GC can detect references and pin the objects. Maybe having this spilling also as part of the call convention is not that bad.
Transitions on mono wasm are kind of messy, hoping I'm not making any mistakes but a rough overview of how they work is the following:
- interp -> compiled. When compiling aot image we should include compiled IL wrappers for all signatures of compiled methods. This transitions will invoke this compiled wrapper, passing the target IP as an argument.
- interp -> pinvoke. When building app, we traverse over all pinvoke/icall signatures and generate a C thunk that does the transition. We invoke this passing the target IP as an argument.
- compiled -> interp. When compiling aot image, when compiling a call, we keep track of possible signatures that might serve as interpreter entry point. We then include per signature wrapper for interpreter entry. Compiled code will call another method through a special ftndesc structure. In the case of interpreter code, this will be a pair of the InterpMethod to execute and some wrapper to be used.
- pinvoke -> interp. When building app, we traverse over all unmanaged callers only. We generate a C thunk per method to be invoked. There is no signature sharing here.
if (EmitCallIntrinsics(callInfo.hMethod, callInfo.sig)) | ||
callIFunctionPointerVar = m_pStackPointer[-1].var; | ||
m_pStackPointer--; | ||
calliCookie = m_compHnd->GetCookieForInterpreterCalliSig(&callInfo.sig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like adding a new method to the jit interface API is quite invasive. I'm wondering if it is even needed. Long term, I would imagine the result of ldftn is either a compiled ftnptr or a tagged pointer to some interp information. In the case of interpreter invocation, the call stub wouldn't even be used.
I'm wondering whether it might be easier to not compute anything during compile time and when doing the CALLI, in the call to compiled code case, from the code pointer we would obtain the EECodeInfo then the MethodDesc and we would then create the CallStub and store it in some data slot for the CALLI opcode instead. Thus the call stub would be lazily generated during first invocation. Not entirely sure whether this EECodeInfo stuff works with the interpreter precodes though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really annoying to add things to the jit interface, but it's not an unreasonable thing to do. One detail you probably didn't know is that most of the changes are auto-generated after editing the ThunkInput.txt file, and then running the gen script which is next to it. The only actually tricky thing to do is to handle the subset of SPMI changes which are not auto-generated, which for interpreter only changes is technically optional (although I have hopes that once we get around to having a stable implementation of the interpreter, we can use the SPMI infrastructure as we implement optimizations in the interpreter.) In general, I'm not convinced that lazy updating bits as the interpreter executes is a good plan for anything other than dynamic optimization opportunities (such as a monomorphic cache of MethodDesc to execution strategy or something.)
key.token = (DWORD)szMetaSig->token; | ||
|
||
DLDL value; | ||
value.A = CastPointer(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are not using the second field of the DLDL
then the value can just be a DWORDLONG
.
Tagging @dotnet/jit-contrib for JIT-EE GUID update |
GetCookieForInterpreterCalliSig
to pass the needed cookie around.NOTE: This logic results in going through an interpreter->native->interpreter calling thunks to make interpreter to interpreter calls. We will likely want to build an optimized path which uses of NonVirtualEntry2MethodDesc or an interpreter specific form to avoid bouncing through the calling convention trampolines, but I'd like to have a fully functional system before diving into doing that sort of thing as an optimization.