Skip to content

[cDAC] Implement sign extension for ClrDataAddresses #116222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 5, 2025

Conversation

max-charlamb
Copy link
Contributor

@max-charlamb max-charlamb commented Jun 2, 2025

Found as part of #116072. Previously we have been 0 extending all 32-bit values.

The ISOSDacInterface APIs use sign extended values to convert 32-bit pointers to 64-bit values. These are represented using the type CLRDATA_ADDRESS. Changed all CLRDATA_ADDRESS out values in relevant legacy interfaces to use ClrDataAddress to force an explicit conversion. Added a helper to sign extend a TargetPointer to a ClrDataAddress

For example, take the 32-bit sentinel value 0xFFFFFFFF:

with 0 extension (previous behavior) the 64-bit CLRDATA_ADDRESS would be 0x00000000_FFFFFFFF.
with sign extension (after PR) the 64-bit CLRDATA_ADDRESS is 0xFFFFFFFF_FFFFFFFF.

More context given in this DAC comment:

// Convert between CLRDATA_ADDRESS and TADDR.
// Note that CLRDATA_ADDRESS is sign-extended (for compat with Windbg and OS conventions). Converting
// from pointer-size values to CLRDATA_ADDRESS should ALWAYS use this TO_CDADDR macro to avoid bugs when
// dealing with 3/4GB 32-bit address spaces. You must not rely on the compiler's implicit conversion
// from ULONG32 to ULONG64 - it is incorrect. Ideally we'd use some compiler tricks or static analysis
// to help detect such errors (they are nefarious since 3/4GB addresses aren't well tested) .
//
// Note: We're in the process of switching the implementation over to CORDB_ADDRESS instead, which is also
// 64 bits, but 0-extended. This means that conversions between TADDR and CORDB_ADDRESS are simple and natural,
// but as long as we have some legacy code, conversions involving CLRDATA_ADDRESS are a pain. Eventually we
// should eliminate CLRDATA_ADDRESS entirely from the implementation, but that will require moving SOS off of
// the old DAC stuff etc.
//
// Here are the possible conversions:
// TADDR -> CLRDATA_ADDRESS: TO_CDADDR
// CORDB_ADDRESS -> CLRDATA_ADDRESS: TO_CDADDR
// CLRDATA_ADDRESS -> TADDR: CLRDATA_ADDRESS_TO_TADDR
// CORDB_ADDRESS -> TADDR: CORDB_ADDRESS_TO_TADDR
// TADDR -> CORDB_ADDRESS: implicit
// CLRDATA_ADDRESS -> CORDB_ADDRESS: CLRDATA_ADDRESS_TO_TADDR
//
#define TO_CDADDR(taddr) ((CLRDATA_ADDRESS)(LONG_PTR)(taddr))

@Copilot Copilot AI review requested due to automatic review settings June 2, 2025 18:21
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 2, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements sign extension for converting 32-bit pointers to 64-bit CLR data addresses in DAC by using a new helper function. Key changes include replacing direct calls to OutputBufferHelpers with the new static method CopyStringToBuffer, wrapping pointer values with ToClrDataAddress for proper sign extension, and renaming the OutputBufferHelpers class to OutputHelpers.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/native/managed/cdac/mscordaccore_universal/Legacy/SOSDacImpl.cs Replaced calls to OutputBufferHelpers.CopyStringToBuffer with CopyStringToBuffer and applied ToClrDataAddress for pointer conversion
src/native/managed/cdac/mscordaccore_universal/Legacy/OutputHelpers.cs Renamed class from OutputBufferHelpers to OutputHelpers and added ToClrDataAddress implementation for sign extension
Comments suppressed due to low confidence (1)

src/native/managed/cdac/mscordaccore_universal/Legacy/OutputHelpers.cs:27

  • Consider adding unit tests to verify the behavior of ToClrDataAddress on both 32-bit and 64-bit targets.
public static ulong ToClrDataAddress(TargetPointer address, Target target)

@max-charlamb max-charlamb added area-Diagnostics-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 2, 2025
Copy link
Contributor

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@max-charlamb max-charlamb force-pushed the cdac-clrdata-address branch from 36c8003 to 72d04e4 Compare June 3, 2025 18:14
@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@max-charlamb max-charlamb requested a review from jkotas June 3, 2025 21:17
/// <summary>
/// Converts a TargetPointer to a ClrDataAddress using sign extension if required.
/// </summary>
public static ClrDataAddress ToClrDataAddress(this TargetPointer address, Target target)
Copy link
Member

@jkotas jkotas Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in the other direction?

DAC checks the address and throws in the other direction:

inline TADDR CLRDATA_ADDRESS_TO_TADDR(CLRDATA_ADDRESS cdAddr)
{
SUPPORTS_DAC;
#ifndef HOST_64BIT
static_assert_no_msg(sizeof(TADDR)==sizeof(UINT));
INT64 iSignedAddr = (INT64)cdAddr;
if (iSignedAddr > INT_MAX || iSignedAddr < INT_MIN)
{
_ASSERTE_MSG(false, "CLRDATA_ADDRESS out of range for this platform");
DacError(E_INVALIDARG);
}
#endif
return (TADDR)cdAddr;
. Do we have the same behavior or do we just ignore the high bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not modified the ClrDataAddress -> TargetPointer conversion. You are correct that we currently just ignore the high bits.

Ideally, we would handle this case, I discuss this more below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the custom marshaller, I have added the logic to convert from ClrDataAddress to TargetPointer/TargetCodePointer.

@@ -12,7 +12,7 @@ namespace Microsoft.Diagnostics.DataContractReader.Legacy;

internal struct CLRDataModuleExtent
{
public ulong /* CLRDATA_ADDRESS */ baseAddress;
public ClrDataAddress baseAddress;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would an alternative where this stays as ulong look like? I assume that you have considered it, but rejected it.

Copy link
Contributor Author

@max-charlamb max-charlamb Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,

I initially prototyped this without changing the managed COM interface definitions. However, I found it very confusing to keep track of which ulong values should be treated as a CLRDATA_ADDRESS and which are normal ULONG64 values.

For example, the DacpModuleData struct has several unsigned 64 bit integer fields that are not CLRDATA_ADDRESS and should not be sign extended.

struct MSLAYOUT DacpModuleData
{
CLRDATA_ADDRESS Address = 0;
CLRDATA_ADDRESS PEAssembly = 0; // Actually the module address in .NET 9+
CLRDATA_ADDRESS ilBase = 0;
CLRDATA_ADDRESS metadataStart = 0;
ULONG64 metadataSize = 0;
CLRDATA_ADDRESS Assembly = 0; // Assembly pointer
BOOL bIsReflection = FALSE;
BOOL bIsPEFile = FALSE;
ULONG64 dwBaseClassIndex = 0;
ULONG64 dwModuleID = 0;
DWORD dwTransientFlags = 0;
CLRDATA_ADDRESS TypeDefToMethodTableMap = 0;
CLRDATA_ADDRESS TypeRefToMethodTableMap = 0;
CLRDATA_ADDRESS MethodDefToDescMap = 0;
CLRDATA_ADDRESS FieldDefToDescMap = 0;
CLRDATA_ADDRESS MemberRefToDescMap = 0;
CLRDATA_ADDRESS FileReferencesMap = 0;
CLRDATA_ADDRESS ManifestModuleReferencesMap = 0;
CLRDATA_ADDRESS LoaderAllocator = 0;
CLRDATA_ADDRESS ThunkHeap = 0;
ULONG64 dwModuleIndex = 0;

I was hoping to use type aliasing to make it more clear which values should be sign extended.

global using CLRDATA_ADDRESS = ulong;

internal struct DacpModuleData
{
    public CLRDATA_ADDRESS Address;
    public ulong metadataSize;
    ...
}

However, the aliasing seems to be resolved in VSCode's C# introspection, making it less impactful.

image

This led me to use the current approach of a thin struct wrapper, this gives the conversion behavior I want, where a TargetPointer can't be implicitly cast to a ClrDataAddress. But does have some downsides you mentioned above relating to differences in argument passing of a struct vs integer type on some platforms.

As you mentioned, using this for only out variables should have no impact as it will be passed as a pointer type anyways. Ideally, I'd like to make this change for in variables as well to properly convert from CLRDATA_ADDRESS -> TargetPointer when truncating. I have done some experimenting and reading about passing primitives vs primitive sized structs. My understanding is that this should be the same on: ARM, ARM64, x64 MSVC, x64 System V, x86 fastcall, and x86 vectorcall

I do understand this isn't ideal due to some ABIs potentially treating ulong differently from struct Wrapper { ulong Value; } when passing to a method.

I was considering the case of C++ enum types. I'm trying to understand if these are treated as primitive integers of the underlying type when passed as a parameter. If so, does managed code treat interop with enums differently to match this behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be solved by adding custom marshalling to the struct, which source generated COM appears to support. While there is some amount of overhead, this appears to be pretty much what I am looking for. Would you have any concerns with this approach?

[NativeMarshalling(typeof(ClrDataAddressMarshaller))]
internal struct ClrDataAddress : IEquatable<ClrDataAddress>
{
    public ulong Address;

    public ClrDataAddress(ulong address) => Address = address;

    public static implicit operator ulong(ClrDataAddress a) => a.Address;
    public static implicit operator ClrDataAddress(ulong v) => new ClrDataAddress(v);

    public override bool Equals(object? obj) => obj is ClrDataAddress address && Equals(address);
    public readonly bool Equals(ClrDataAddress other) => Address == other.Address;
    public override readonly int GetHashCode() => Address.GetHashCode();

    public override readonly string ToString() => $"0x{Address:x}";
}

[CustomMarshaller(typeof(ClrDataAddress), MarshalMode.Default, typeof(ClrDataAddressMarshaller))]
internal static unsafe class ClrDataAddressMarshaller
{
    public static ClrDataAddress ConvertToManaged(ulong address) => new ClrDataAddress(address);
    public static ulong ConvertToUnmanaged(ClrDataAddress address) => address.Address;
}

@max-charlamb max-charlamb requested a review from jkotas June 4, 2025 18:08
@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@max-charlamb
Copy link
Contributor Author

/azp run runtime-diagnostics

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

/// <summary>
/// Represents the native CLRDATA_ADDRESS 64-bit type which uses sign extending
/// when converting from 32-bit values to 64-bit values.
/// For more information see TO_CDADDR in dacimpl.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We hope dacimpl.h is going to disappear eventually, so this comment is not durable

/// <summary>
/// Converts a ClrDataAddress to a TargetCodePointer, ensuring the address is within the valid range for the target platform.
/// </summary>
public static TargetCodePointer ToTargetCodePointer(this ClrDataAddress address, Target target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The marshaller-based solution looks good. Thank you

@max-charlamb max-charlamb merged commit 088856e into dotnet:main Jun 5, 2025
150 checks passed
@max-charlamb max-charlamb deleted the cdac-clrdata-address branch June 5, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants