Skip to content

[API Proposal]: Embrace spans in System.Reflection.Metadata. #85280

Open
@teo-tsirpanis

Description

@teo-tsirpanis

Background and motivation

System.Reflection.Metadata is a pretty perormance-oriented library but its API lacks methods that accept spans or facilitate zero-copy memory access for some scenarios. I propose additional span overloads for methods that work with memory buffers. Some of these APIs can now be implemented efficiently thanks to the work started in #81059.

API Proposal

namespace System.Reflection.Metadata;

public class BlobContentId
{
    public BlobContentId(ReadOnlySpan<byte> id);
    public static BllobContentId FromHash(ReadOnlySpan<byte> hashCode);
}

public struct BlobReader
{
    public ReadOnlySpan<byte> RemainingSpan { get; }
    public ReadOnlySpan<byte> Span { get; }
}

public struct BlobWriter
{
    public void WriteBytes(ReadOnlySpan<byte> buffer);
}

public struct MetadataReader
{
    public BlobReader GetBlobBuilder(UserStringHandle handle);
    // Taken from #103169 and expanded
    public bool TryGetBlob(BlobHandle handle, Span<byte> span, out int length);
    public bool TryGetString(StringHandle handle, Span<char> span, out int length);
    public bool TryGetString(DocumentNameBlobHandle handle, Span<char> span, out int length);
    public bool TryGetString(NamespaceDefinitionHandle handle, Span<char> span, out int length);
    public bool TryGetString(UserStringHandle handle, Span<char> span, out int length);
}

namespace System.Reflection.Metadata.Ecma335;

public struct ArrayShapeEncoder
{
    public void Shape(int rank, ReadOnlySpan<int> sizes, ReadOnlySpan<int> lowerBounds) { }
}

public sealed class MetadataBuilder
{
    // These APIs will not allocate if the blob/string already exists.
    public BlobHandle GetOrAddBlob(ReadOnlySpan<byte> value);
    public BlobHandle GetOrAddBlobUTF8(ReadOnlySpan<char> value, bool allowUnpairedSurrogates = true);
    public BlobHandle GetOrAddBlobUTF16(ReadOnlySpan<char> value);
    public BlobHandle GetOrAddDocumentName(ReadOnlySpan<char> value);
    public StringHandle GetOrAddString(ReadOnlySpan<char> value);
    public UserStringHandle GetOrAddUserString(ReadOnlySpan<char> value);
}

public readonly partial struct PermissionSetEncoder
{
    public PermissionSetEncoder AddPermission(string typeName, ReadOnlySpan<byte> encodedArguments);
}

namespace System.Reflection.PortableExecutable;

public sealed partial class DebugDirectoryBuilder
{
    public void AddEntry(DebugDirectoryEntryType type, uint version, uint stamp, ReadOnlySpan<byte> data);
    // Existing API, add allows ref struct to TData on supported frameworks
    public void AddEntry<TData>(DebugDirectoryEntryType type, uint version, uint stamp, TData data, System.Action<BlobBuilder, TData> dataSerializer)
#if NET9_0_OR_HIGHER
        where TData : allows ref struct
#endif
    ;
    public void AddPdbChecksumEntry(string algorithmName, ReadOnlySpan<byte> checksum);
}

API Usage

The proposed APIs correspond to existing ones that work with strings and (immutable) byte arrays. Their usage will be similar.

Alternative Designs

  • The APIs in ArrayShapeEncoder and PermissionSetEncoder are not strictly necessary (especially the latter) and could be omitted.
  • The TryGet*** APIs are not necessary either; a user can call GetBlobReader and have zero-copy access to the raw bytes.
    • However, when called on virtual blob and string handles (used for WinRT projections), GetBlobReader returns a buffer pointing to newly allocated and pinned memory, which is not the best for performance, but there is room for improvement.
    • Namespace definition and document name blob handles don't have a raw byte representation, so the TryGet*** APIs will be needed to avoid allocations.
  • A previous iteration of this proposal suggested some additional APIs for BlobReader to read byte buffers and strings into spans, but I removed them because this is something users can already do by getting the reader's underlying pointer or span.

Risks

APIs to get a span from a BlobReader might be considered unsafe because BlobReader wraps unmanaged memory and the span will not ensure the memory is kept alive. But this is a problem with SRM in general.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions