Support unknown fields #456

steven-sheehy · 2025-04-04T18:20:19Z

Problem

The mirror node needs to be both backwards and future compatible with all versions of the record and block stream protobufs. If any field is removed, we still need to parse historical versions of it. If any new transaction type is added but we have not updated the protobuf version to add support, we still store generic transaction info without failing.

Solution

Add optional support for unknown fields when using parse(). In protoc, this takes the form of Map<Integer, Field> fields. parseStrict() should remain unchanged and throw an error if unknown fields are present.

Alternatives

No response

The text was updated successfully, but these errors were encountered:

anthony-swirldslabs · 2025-04-15T20:17:02Z

As mentioned off-line previously, PBJ deliberately didn't implement support for unknown fields as a performance optimization, so as to avoid allocating objects (e.g. maps, byte arrays, and similar) to keep this information in the models. However, as long as this feature is optional, then PBJ could support unknown fields.

For example, it could be enabled via an additional boolean argument passed to the parse() method, e.g.:

@NonNull
T parse(@NonNull ReadableSequentialData input, boolean strictMode, int maxDepth, boolean preserveUnknownFields) throws ParseException;

In the generated model implementation, we'd add a private field of type Map<Integer, UnknownField> which would be initialized to null if preserveUnknownFields is false. However, if the parsing of unknown fields is enabled, then PBJ could create a map object when it encounters the first unknown field and store a reference to this map in that new field. The UnknownField could be defined similar to:

record UnknownField(ProtoConstants wireType, Bytes data){}

The model's copyBuilder() method would preserve the unknown fields map in the new builder.

Finally, the codec's write() method would serialize all the unknown fields so as to preserve them and mimic the Google Protobuf behavior.

However, it's unclear whether applications need to have access to the unknown fields data. Since they are unknown to the application, it's clear that the application is using a .proto model that differs from the model which was used to serialize the original object. Therefore, with the exception of a few edge cases, the application wouldn't be able to meaningfully interpret the unknown fields.

Could you please confirm if access to the unknown fields is necessary at the model public API level, and if so, please clarify the use case?

steven-sheehy · 2025-04-15T22:54:19Z

Yes, as already stated, the mirror node uses the data from the unknown fields. It is a requirement that mirror node not fail if it encounters a transaction type that it doesn't understand. But we still want to store generic transaction info so we extract the transaction type ordinal from the protobuf. We also extract the signature type and bytes for newly added signatures we're unaware of. And it's used in various tests. Here's a link to those areas: https://github.com/search?q=repo%3Ahiero-ledger%2Fhiero-mirror-node%20unknownfields&type=code

I might also suggest that the parse method is getting a bit unwieldy with the addition of more overloaded params. We might want to add an options class so it is future compatible with any new option. For example, ParseOptions.newBuilder().depth(1).strictMode(true).unknownFields(true).build() which apps can create once and reuse across requests.

anthony-swirldslabs · 2025-04-15T23:59:30Z

I agree that an API with an Options argument would be easier to extend. However, PBJ is focused on the performance and we try really hard to avoid allocating new objects, or requiring the client code to allocate new objects just for the purpose of calling a PBJ method. While I understand that the client code could cache and reuse frequently used options instances, I don't think this would establish the right pattern. For this reason, I think we'd go with an extra argument. The backward compatibility can trivially be achieved by adding a default overloaded version of this method to the Codec interface in PBJ that uses the default value of false for this new argument. So the old code wouldn't break, and the new code could start using the new method.

Thanks for elaborating on your use-case. That makes sense. Would an access to an unmodifiable Map<Integer, UnknownField> with record UnknownField(ProtoConstants wireType, Bytes data){} as proposed above be sufficient for you? The keys in the map would be the field numbers, and the values would tell you the Protobuf wire type and the actual bytes containing the data of the field. You'd be responsible for parsing the bytes using the wire type of the data if you need to actually interpret it as, say, an integer, or whatever the wire type of the data might be (see ProtoConstants in PBJ for more details.)

steven-sheehy · 2025-04-16T00:35:27Z

Yes, that works for us. Thanks for looking into this.

anthony-swirldslabs · 2025-04-17T23:48:12Z

@steven-sheehy : The fix is out for review at #471

The only notable change from what we've discussed above is the UnknownField which now looks like:

public record UnknownField(ProtoConstants wireType, List<Bytes> bytes) {}

You can see that bytes is now a list. This is necessary to properly support repeated fields. This list will always contain at least one entry. But if the same field is present on the wire multiple times, then the list may contain more entries.

steven-sheehy added the Feature Enhancement Enhancing an existing feature driven by business requirements. Typically backwards compatible. label Apr 4, 2025

anthony-swirldslabs linked a pull request Apr 17, 2025 that will close this issue

feat: support unknown fields #471

Open

2 tasks

anthony-swirldslabs moved this to 👀 In Review in Foundation Team Apr 17, 2025

anthony-swirldslabs added this to Foundation Team Apr 17, 2025

Jeffrey-morgan34 assigned anthony-swirldslabs Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support unknown fields #456

Support unknown fields #456

steven-sheehy commented Apr 4, 2025

anthony-swirldslabs commented Apr 15, 2025

steven-sheehy commented Apr 15, 2025

anthony-swirldslabs commented Apr 15, 2025

steven-sheehy commented Apr 16, 2025

anthony-swirldslabs commented Apr 17, 2025

Support unknown fields #456

Support unknown fields #456

Comments

steven-sheehy commented Apr 4, 2025

Problem

Solution

Alternatives

anthony-swirldslabs commented Apr 15, 2025

steven-sheehy commented Apr 15, 2025

anthony-swirldslabs commented Apr 15, 2025

steven-sheehy commented Apr 16, 2025

anthony-swirldslabs commented Apr 17, 2025