Skip to content

Commit b1ab309

Browse files
EgorBojakobbotsch
andauthored
Never use heap for return buffers (#112060)
Co-authored-by: jakobbotsch <[email protected]>
1 parent 7e82ad9 commit b1ab309

File tree

14 files changed

+130
-51
lines changed

14 files changed

+130
-51
lines changed

docs/design/coreclr/botr/clr-abi.md

+2-6
Original file line numberDiff line numberDiff line change
@@ -108,9 +108,7 @@ Passing/returning structs according to hardware floating-point calling conventio
108108

109109
## Return buffers
110110

111-
The same applies to some return buffers. See `MethodTable::IsStructRequiringStackAllocRetBuf()`. When that returns `false`, the return buffer might be on the heap, either due to reflection/remoting code paths mentioned previously or due to a JIT optimization where a call with a return buffer that then assigns to a field (on the GC heap) are changed into passing the field reference as the return buffer. Conversely, when it returns true, the JIT does not need to use a write barrier when storing to the return buffer, but it is still not guaranteed to be a compiler temp, and as such the JIT should not introduce spurious writes to the return buffer.
112-
113-
NOTE: This optimization is now disabled for all platforms (`IsStructRequiringStackAllocRetBuf()` always returns `false`).
111+
Since .NET 10, return buffers must always be allocated on the stack by the caller. After the call, the caller is responsible for copying the return buffer to the final destination using write barriers if necessary. The JIT can assume that the return buffer is always on the stack and may optimize accordingly, such as by omitting write barriers when writing GC pointers to the return buffer. In addition, the buffer is allowed to be used for temporary storage within the method since its content must not be aliased or cross-thread visible.
114112

115113
ARM64-only: When a method returns a structure that is larger than 16 bytes the caller reserves a return buffer of sufficient size and alignment to hold the result. The address of the buffer is passed as an argument to the method in `R8` (defined in the JIT as `REG_ARG_RET_BUFF`). The callee isn't required to preserve the value stored in `R8`.
116114

@@ -778,9 +776,7 @@ The return value is handled as follows:
778776
1. Floating-point values are returned on the top of the hardware FP stack.
779777
2. Integers up to 32 bits long are returned in EAX.
780778
3. 64-bit integers are passed with EAX holding the least significant 32 bits and EDX holding the most significant 32 bits.
781-
4. All other cases require the use of a return buffer, through which the value is returned.
782-
783-
In addition, there is a guarantee that if a return buffer is used a value is stored there only upon ordinary exit from the method. The buffer is not allowed to be used for temporary storage within the method and its contents will be unaltered if an exception occurs while executing the method.
779+
4. All other cases require the use of a return buffer, through which the value is returned. See [Return buffers](#return-buffers).
784780

785781
# Control Flow Guard (CFG) support on Windows
786782

docs/design/features/tailcalls-with-helpers.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -259,9 +259,9 @@ stack, even on ARM architectures, due to its return address hijacking mechanism.
259259

260260
When the result is returned by value the JIT will introduce a local and pass a
261261
pointer to it in the second argument. For ret bufs the JIT will typically
262-
directly pass along its own return buffer parameter to DispatchTailCalls. It is
263-
possible that this return buffer is pointing into GC heap, so the result is
264-
always tracked as a byref in the mechanism.
262+
directly pass along its own return buffer parameter to DispatchTailCalls. The
263+
return buffer is always located on the stack, so the result does not need to be
264+
tracked as a byref.
265265

266266
In certain cases the target function pointer is also stored. For some targets
267267
this might require the JIT to perform the equivalent of `ldvirtftn` or `ldftn`

src/coreclr/inc/readytorun.h

+3-2
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@
1919
// src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h
2020
// If you update this, ensure you run `git grep MINIMUM_READYTORUN_MAJOR_VERSION`
2121
// and handle pending work.
22-
#define READYTORUN_MAJOR_VERSION 11
22+
#define READYTORUN_MAJOR_VERSION 12
2323
#define READYTORUN_MINOR_VERSION 0x0000
2424

25-
#define MINIMUM_READYTORUN_MAJOR_VERSION 11
25+
#define MINIMUM_READYTORUN_MAJOR_VERSION 12
2626

2727
// R2R Version 2.1 adds the InliningInfo section
2828
// R2R Version 2.2 adds the ProfileDataInfo section
@@ -39,6 +39,7 @@
3939
// R2R Version 10.0 adds support for the statics being allocated on a per type basis instead of on a per module basis, disable support for LogMethodEnter helper
4040
// R2R Version 10.1 adds Unbox_TypeTest helper
4141
// R2R Version 11 uses GCInfo v4, which encodes safe points without -1 offset and does not track return kinds in GCInfo
42+
// R2R Version 12 requires all return buffers to be always on the stack
4243

4344
struct READYTORUN_CORE_HEADER
4445
{

src/coreclr/jit/compiler.h

+4-3
Original file line numberDiff line numberDiff line change
@@ -4938,7 +4938,7 @@ class Compiler
49384938
Statement** pAfterStmt = nullptr,
49394939
const DebugInfo& di = DebugInfo(),
49404940
BasicBlock* block = nullptr);
4941-
GenTree* impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel);
4941+
GenTree* impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel, GenTreeFlags indirFlags = GTF_EMPTY);
49424942

49434943
GenTree* impGetNodeAddr(GenTree* val, unsigned curLevel, GenTreeFlags* pDerefFlags);
49444944

@@ -5228,8 +5228,8 @@ class Compiler
52285228
static LONG jitNestingLevel;
52295229
#endif // DEBUG
52305230

5231-
static bool impIsInvariant(const GenTree* tree);
5232-
static bool impIsAddressInLocal(const GenTree* tree, GenTree** lclVarTreeOut = nullptr);
5231+
bool impIsInvariant(const GenTree* tree);
5232+
bool impIsAddressInLocal(const GenTree* tree, GenTree** lclVarTreeOut = nullptr);
52335233

52345234
void impMakeDiscretionaryInlineObservations(InlineInfo* pInlineInfo, InlineResult* inlineResult);
52355235

@@ -6793,6 +6793,7 @@ class Compiler
67936793

67946794
public:
67956795
bool fgAddrCouldBeNull(GenTree* addr);
6796+
bool fgAddrCouldBeHeap(GenTree* addr);
67966797
void fgAssignSetVarDef(GenTree* tree);
67976798

67986799
private:

src/coreclr/jit/flowgraph.cpp

+36
Original file line numberDiff line numberDiff line change
@@ -928,6 +928,42 @@ bool Compiler::fgAddrCouldBeNull(GenTree* addr)
928928
return true; // default result: addr could be null.
929929
}
930930

931+
//------------------------------------------------------------------------------
932+
// fgAddrCouldBeHeap: Check whether the address tree may represent a heap address.
933+
//
934+
// Arguments:
935+
// addr - Address to check
936+
//
937+
// Return Value:
938+
// True if address could be a heap address; false otherwise (i.e. stack, native memory, etc.)
939+
//
940+
bool Compiler::fgAddrCouldBeHeap(GenTree* addr)
941+
{
942+
GenTree* op = addr;
943+
while (op->OperIs(GT_FIELD_ADDR) && op->AsFieldAddr()->IsInstance())
944+
{
945+
op = op->AsFieldAddr()->GetFldObj();
946+
}
947+
948+
target_ssize_t offset;
949+
gtPeelOffsets(&op, &offset);
950+
951+
// Ignore the offset for locals
952+
953+
if (op->OperIs(GT_LCL_ADDR))
954+
{
955+
return false;
956+
}
957+
958+
if (op->OperIsScalarLocal() && (op->AsLclVarCommon()->GetLclNum() == impInlineRoot()->info.compRetBuffArg))
959+
{
960+
// RetBuf is known to be on the stack
961+
return false;
962+
}
963+
964+
return true;
965+
}
966+
931967
//------------------------------------------------------------------------------
932968
// fgOptimizeDelegateConstructor: try and optimize construction of a delegate
933969
//

src/coreclr/jit/gentree.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -18301,7 +18301,7 @@ bool GenTreeIndir::IsAddressNotOnHeap(Compiler* comp)
1830118301
return true;
1830218302
}
1830318303

18304-
if (HasBase() && Base()->gtSkipReloadOrCopy()->OperIs(GT_LCL_ADDR))
18304+
if (HasBase() && !comp->fgAddrCouldBeHeap(Base()->gtSkipReloadOrCopy()))
1830518305
{
1830618306
return true;
1830718307
}

src/coreclr/jit/importer.cpp

+48-9
Original file line numberDiff line numberDiff line change
@@ -843,7 +843,23 @@ GenTree* Compiler::impStoreStruct(GenTree* store,
843843
// TODO-Bug?: verify if flags matter here
844844
GenTreeFlags indirFlags = GTF_EMPTY;
845845
GenTree* destAddr = impGetNodeAddr(store, CHECK_SPILL_ALL, &indirFlags);
846-
NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(wellKnownArgType);
846+
847+
// Make sure we don't pass something other than a local address to the return buffer arg.
848+
// It is allowed to pass current's method return buffer as it is a local too.
849+
if (fgAddrCouldBeHeap(destAddr) && !eeIsByrefLike(srcCall->gtRetClsHnd))
850+
{
851+
unsigned tmp = lvaGrabTemp(false DEBUGARG("stack copy for value returned via return buffer"));
852+
lvaSetStruct(tmp, srcCall->gtRetClsHnd, false);
853+
854+
GenTree* spilledCall = gtNewStoreLclVarNode(tmp, srcCall);
855+
spilledCall = impStoreStruct(spilledCall, curLevel, pAfterStmt, di, block);
856+
store->Data() = gtNewOperNode(GT_COMMA, store->TypeGet(), spilledCall,
857+
gtNewLclvNode(tmp, lvaGetDesc(tmp)->TypeGet()));
858+
859+
return impStoreStruct(store, curLevel, pAfterStmt, di, block);
860+
}
861+
862+
NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(wellKnownArgType);
847863

848864
if (destAddr->OperIs(GT_LCL_ADDR))
849865
{
@@ -953,6 +969,27 @@ GenTree* Compiler::impStoreStruct(GenTree* store,
953969
// TODO-Bug?: verify if flags matter here
954970
GenTreeFlags indirFlags = GTF_EMPTY;
955971
GenTree* destAddr = impGetNodeAddr(store, CHECK_SPILL_ALL, &indirFlags);
972+
973+
// Make sure we don't pass something other than a local address to the return buffer arg.
974+
// It is allowed to pass current's method return buffer as it is a local too.
975+
if (fgAddrCouldBeHeap(destAddr) && !eeIsByrefLike(call->gtRetClsHnd))
976+
{
977+
unsigned tmp = lvaGrabTemp(false DEBUGARG("stack copy for value returned via return buffer"));
978+
lvaSetStruct(tmp, call->gtRetClsHnd, false);
979+
destAddr = gtNewLclVarAddrNode(tmp, TYP_I_IMPL);
980+
981+
// Insert address of temp into existing call
982+
NewCallArg retBufArg = NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer);
983+
call->gtArgs.InsertAfterThisOrFirst(this, retBufArg);
984+
985+
// Now the store needs to copy from the new temp instead.
986+
call->gtType = TYP_VOID;
987+
src->gtType = TYP_VOID;
988+
var_types tmpType = lvaGetDesc(tmp)->TypeGet();
989+
store->Data() = gtNewOperNode(GT_COMMA, tmpType, src, gtNewLclvNode(tmp, tmpType));
990+
return impStoreStruct(store, CHECK_SPILL_ALL, pAfterStmt, di, block);
991+
}
992+
956993
call->gtArgs.InsertAfterThisOrFirst(this,
957994
NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer));
958995

@@ -1010,21 +1047,22 @@ GenTree* Compiler::impStoreStruct(GenTree* store,
10101047
// impStoreStructPtr: Store (copy) the structure from 'src' to 'destAddr'.
10111048
//
10121049
// Arguments:
1013-
// destAddr - address of the destination of the store
1014-
// value - value to store
1015-
// curLevel - stack level for which a spill may be being done
1050+
// destAddr - address of the destination of the store
1051+
// value - value to store
1052+
// curLevel - stack level for which a spill may be being done
1053+
// indirFlags - flags to be used on the store node
10161054
//
10171055
// Return Value:
10181056
// The tree that should be appended to the statement list that represents the store.
10191057
//
10201058
// Notes:
10211059
// Temp stores may be appended to impStmtList if spilling is necessary.
10221060
//
1023-
GenTree* Compiler::impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel)
1061+
GenTree* Compiler::impStoreStructPtr(GenTree* destAddr, GenTree* value, unsigned curLevel, GenTreeFlags indirFlags)
10241062
{
10251063
var_types type = value->TypeGet();
10261064
ClassLayout* layout = (type == TYP_STRUCT) ? value->GetLayout(this) : nullptr;
1027-
GenTree* store = gtNewStoreValueNode(type, layout, destAddr, value);
1065+
GenTree* store = gtNewStoreValueNode(type, layout, destAddr, value, indirFlags);
10281066
store = impStoreStruct(store, curLevel);
10291067

10301068
return store;
@@ -11247,18 +11285,19 @@ bool Compiler::impReturnInstruction(int prefixFlags, OPCODE& opcode)
1124711285

1124811286
if (info.compRetBuffArg != BAD_VAR_NUM)
1124911287
{
11288+
var_types retBuffType = lvaGetDesc(info.compRetBuffArg)->TypeGet();
1125011289
// Assign value to return buff (first param)
1125111290
GenTree* retBuffAddr =
11252-
gtNewLclvNode(info.compRetBuffArg, TYP_BYREF DEBUGARG(impCurStmtDI.GetLocation().GetOffset()));
11291+
gtNewLclvNode(info.compRetBuffArg, retBuffType DEBUGARG(impCurStmtDI.GetLocation().GetOffset()));
1125311292

11254-
op2 = impStoreStructPtr(retBuffAddr, op2, CHECK_SPILL_ALL);
11293+
op2 = impStoreStructPtr(retBuffAddr, op2, CHECK_SPILL_ALL, GTF_IND_TGT_NOT_HEAP);
1125511294
impAppendTree(op2, CHECK_SPILL_NONE, impCurStmtDI);
1125611295

1125711296
// There are cases where the address of the implicit RetBuf should be returned explicitly.
1125811297
//
1125911298
if (compMethodReturnsRetBufAddr())
1126011299
{
11261-
op1 = gtNewOperNode(GT_RETURN, TYP_BYREF, gtNewLclvNode(info.compRetBuffArg, TYP_BYREF));
11300+
op1 = gtNewOperNode(GT_RETURN, retBuffType, gtNewLclvNode(info.compRetBuffArg, retBuffType));
1126211301
}
1126311302
else
1126411303
{

src/coreclr/jit/lclvars.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -485,7 +485,7 @@ void Compiler::lvaInitRetBuffArg(InitVarDscInfo* varDscInfo, bool useFixedRetBuf
485485
info.compRetBuffArg = varDscInfo->varNum;
486486

487487
LclVarDsc* varDsc = varDscInfo->varDsc;
488-
varDsc->lvType = TYP_BYREF;
488+
varDsc->lvType = TYP_I_IMPL;
489489
varDsc->lvIsParam = 1;
490490
varDsc->lvIsRegArg = 0;
491491

src/coreclr/jit/rationalize.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ void Rationalizer::RewriteNodeAsCall(GenTree** use,
157157
tmpNum = comp->lvaGrabTemp(true DEBUGARG("return buffer for hwintrinsic"));
158158
comp->lvaSetStruct(tmpNum, sig->retTypeClass, false);
159159

160-
GenTree* destAddr = comp->gtNewLclVarAddrNode(tmpNum, TYP_BYREF);
160+
GenTree* destAddr = comp->gtNewLclVarAddrNode(tmpNum, TYP_I_IMPL);
161161
NewCallArg newArg = NewCallArg::Primitive(destAddr).WellKnown(WellKnownArg::RetBuffer);
162162

163163
call->gtArgs.InsertAfterThisOrFirst(comp, newArg);

src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ struct ReadyToRunHeaderConstants
1111
{
1212
static const uint32_t Signature = 0x00525452; // 'RTR'
1313

14-
static const uint32_t CurrentMajorVersion = 11;
14+
static const uint32_t CurrentMajorVersion = 12;
1515
static const uint32_t CurrentMinorVersion = 0;
1616
};
1717

src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ internal struct ReadyToRunHeaderConstants
1515
{
1616
public const uint Signature = 0x00525452; // 'RTR'
1717

18-
public const ushort CurrentMajorVersion = 11;
18+
public const ushort CurrentMajorVersion = 12;
1919
public const ushort CurrentMinorVersion = 0;
2020
}
2121
#if READYTORUN

src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/GCRefMapBuilder.cs

-7
Original file line numberDiff line numberDiff line change
@@ -182,13 +182,6 @@ private void FakeGcScanRoots(MethodDesc method, ArgIterator argit, CORCOMPILE_GC
182182
return;
183183
}
184184

185-
// Also if the method has a return buffer, then it is the first argument, and could be an interior ref,
186-
// so always promote it.
187-
if (argit.HasRetBuffArg())
188-
{
189-
frame[_transitionBlock.GetRetBuffArgOffset(argit.HasThis)] = CORCOMPILE_GCREFMAP_TOKENS.GCREFMAP_INTERIOR;
190-
}
191-
192185
//
193186
// Now iterate the arguments
194187
//

src/coreclr/vm/frames.cpp

-14
Original file line numberDiff line numberDiff line change
@@ -1552,13 +1552,6 @@ void TransitionFrame::PromoteCallerStackHelper(promote_func* fn, ScanContext* sc
15521552
(fn)(PTR_PTR_Object(pThis), sc, CHECK_APP_DOMAIN);
15531553
}
15541554

1555-
if (argit.HasRetBuffArg())
1556-
{
1557-
PTR_PTR_VOID pRetBuffArg = dac_cast<PTR_PTR_VOID>(pTransitionBlock + argit.GetRetBuffArgOffset());
1558-
LOG((LF_GC, INFO3, " ret buf Argument promoted from" FMT_ADDR "\n", DBG_ADDR(*pRetBuffArg) ));
1559-
PromoteCarefully(fn, PTR_PTR_Object(pRetBuffArg), sc, GC_CALL_INTERIOR|CHECK_APP_DOMAIN);
1560-
}
1561-
15621555
int argOffset;
15631556
while ((argOffset = argit.GetNextOffset()) != TransitionBlock::InvalidOffset)
15641557
{
@@ -2147,13 +2140,6 @@ void FakeGcScanRoots(MetaSig& msig, ArgIterator& argit, MethodDesc * pMD, BYTE *
21472140
return;
21482141
}
21492142

2150-
// Also if the method has a return buffer, then it is the first argument, and could be an interior ref,
2151-
// so always promote it.
2152-
if (argit.HasRetBuffArg())
2153-
{
2154-
FakePromote((Object **)(pFrame + argit.GetRetBuffArgOffset()), &sc, GC_CALL_INTERIOR);
2155-
}
2156-
21572143
//
21582144
// Now iterate the arguments
21592145
//

src/coreclr/vm/reflectioninvocation.cpp

+29-2
Original file line numberDiff line numberDiff line change
@@ -492,6 +492,9 @@ extern "C" void QCALLTYPE RuntimeMethodHandle_InvokeMethod(
492492
// If an exception occurs a gc may happen but we are going to dump the stack anyway and we do
493493
// not need to protect anything.
494494

495+
// Allocate a local buffer for the return buffer if necessary
496+
PVOID pLocalRetBuf = nullptr;
497+
495498
{
496499
BEGINFORBIDGC();
497500
#ifdef _DEBUG
@@ -501,8 +504,19 @@ extern "C" void QCALLTYPE RuntimeMethodHandle_InvokeMethod(
501504
// Take care of any return arguments
502505
if (fHasRetBuffArg)
503506
{
504-
PVOID pRetBuff = gc.retVal->GetData();
505-
*((LPVOID*) (pTransitionBlock + argit.GetRetBuffArgOffset())) = pRetBuff;
507+
_ASSERT(hasValueTypeReturn);
508+
PTR_MethodTable pMT = retTH.GetMethodTable();
509+
size_t localRetBufSize = retTH.GetSize();
510+
511+
// Allocate a local buffer. The invoked method will write the return value to this
512+
// buffer which will be copied to gc.retVal later.
513+
pLocalRetBuf = _alloca(localRetBufSize);
514+
ZeroMemory(pLocalRetBuf, localRetBufSize);
515+
*((LPVOID*) (pTransitionBlock + argit.GetRetBuffArgOffset())) = pLocalRetBuf;
516+
if (pMT->ContainsGCPointers())
517+
{
518+
pValueClasses = new (_alloca(sizeof(ValueClassInfo))) ValueClassInfo(pLocalRetBuf, pMT, pValueClasses);
519+
}
506520
}
507521

508522
// copy args
@@ -572,6 +586,19 @@ extern "C" void QCALLTYPE RuntimeMethodHandle_InvokeMethod(
572586
// Call the method
573587
CallDescrWorkerWithHandler(&callDescrData);
574588

589+
if (fHasRetBuffArg)
590+
{
591+
// Copy the return value from the return buffer to the object
592+
if (retTH.GetMethodTable()->ContainsGCPointers())
593+
{
594+
memmoveGCRefs(gc.retVal->GetData(), pLocalRetBuf, retTH.GetSize());
595+
}
596+
else
597+
{
598+
memcpyNoGCRefs(gc.retVal->GetData(), pLocalRetBuf, retTH.GetSize());
599+
}
600+
}
601+
575602
// It is still illegal to do a GC here. The return type might have/contain GC pointers.
576603
if (fConstructor)
577604
{

0 commit comments

Comments
 (0)