Skip to content

BJData optimized binary array type #4513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 7, 2025
Merged

Conversation

nebkat
Copy link
Contributor

@nebkat nebkat commented Nov 25, 2024

See NeuroJSON/bjdata#6 for further information.

Introduces a dedicated B marker for bytes.

This is used as the strong type marker in optimized array format to encode binary data such that it can also be decoded back to binary data (instead of wrongly decoding as an integer array).

Draft while awaiting the release of BJData draft 3.

Would a legacy_binary flag be desirable to continue supporting draft 2 expectations of uint8 typed arrays for binary?


Pull request checklist

Read the Contribution Guidelines for detailed information.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header files single_include/nlohmann/json.hpp and single_include/nlohmann/json_fwd.hpp. The whole process is described here.

@coveralls
Copy link

coveralls commented Nov 25, 2024

Coverage Status

coverage: 99.639%. remained the same
when pulling 2397072 on nebkat:develop
into 60c4875 on nlohmann:develop.

Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @nebkat
Please read and follow the Contribution Guidelines.

@nlohmann
Copy link
Owner

Please run make amalgamate with AStyle 3.1.

@nebkat nebkat force-pushed the develop branch 6 times, most recently from d9b1035 to 7f12cd6 Compare December 5, 2024 02:16
@nebkat
Copy link
Contributor Author

nebkat commented Dec 5, 2024

draft3_binary parameter added to to_bjdata, defaulting to false, along with tests for both versions.

from_bjdata previously wasn't able to parse uint8 binary arrays so it has just gained the ability to parse byte arrays as binary with no parameter.

Would it make sense to introduce a to_bjdata_draft3() function and deprecate to_bjdata() to encourage use of the newer version?

@nlohmann
Copy link
Owner

nlohmann commented Dec 5, 2024

Thanks a lot! Feels much better without a breaking change!

Regarding the draft version - would it make sense to add an int parameter draft to the function and defaulting it to 2 and allow to set it to 3 instead of the boolean? Then we have some room for the future there.

Regarding deprecation - adding a new function to the API just for the sake of deprecating another feels odd. I think we should rather deprecate the default parameter for the mentioned draft parameter and force client to choose the value in the future.

What do you think?

@nebkat
Copy link
Contributor Author

nebkat commented Dec 5, 2024

I like it, didn't think of that, much better solution!

So perhaps something like this?

/// how to encode BJData
enum class bjdata_version_t
{
    draft2 JSON_HEDLEY_DEPRECATED_FOR(3.12.0, draft3),
    draft3,
};

...

static std::vector<std::uint8_t> to_bjdata(const basic_json& j,
        const bool use_size = false,
        const bool use_type = false,
        const bjdata_version_t version = bjdata_version_t::draft2)
{
    std::vector<std::uint8_t> result;
    to_bjdata(j, result, use_size, use_type, version);
    return result;
}

// warning: 'nlohmann::json_abi_v3_11_3::detail::bjdata_version_t::draft2' is deprecated: Since 3.12.0; use draft3 [-Wdeprecated-declarations]
// 4315 |             const bjdata_version_t version = bjdata_version_t::draft2)

Could technically even put ubjson as a version as they are so close and use it internally.

@nlohmann
Copy link
Owner

Could technically even put ubjson as a version as they are so close and use it internally.

You mean using BJData as parameter for UBJSON output/input?

@nebkat
Copy link
Contributor Author

nebkat commented Dec 10, 2024

Yes, internally.

json/include/nlohmann/detail/output/binary_writer.hpp:

    // json/include/nlohmann/detail/output/binary_writer.hpp
    void write_ubjson(const BasicJsonType& j, const bool use_count,
                      const bool use_type, const bool add_prefix = true,
-                      const bool use_bjdata = false)
+                      const bjdata_version_t version = bjdata_version_t::ubjson) // or "draft0", which could be argued is equivalent
    {

Then we can do:

include/nlohmann/json.hpp:

    /// @brief create a UBJSON serialization of a given JSON value
    /// @sa https://json.nlohmann.me/api/basic_json/to_ubjson/
    static void to_ubjson(const basic_json& j, detail::output_adapter<std::uint8_t> o,
                          const bool use_size = false, const bool use_type = false)
    {
-        binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type);
+         binary_writer<std::uint8_t>(o).write_bjdata(j, use_size, use_type, bjdata_version_t::ubjson); // Or defaulted as above
    }

    /// @brief create a BJData serialization of a given JSON value
    /// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
    static void to_bjdata(const basic_json& j, detail::output_adapter<std::uint8_t> o,
-                         const bool use_size = false, const bool use_type = false)
+                         const bool use_size = false, const bool use_type = false, const bjdata_version_t version = bjdata_version_t::draft2)
    {
-        binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, true);
+        binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, version);
    }

Would cut down on the amount of if (use_bjdata && bjdata_version == ...) and instead just be if (version == ...).

@nlohmann
Copy link
Owner

Yes, what I meant: would this be something that could - in the long run - be exposed to the customer? As in: deprecating to_bjdata in favor of to_ubjson with a version parameter.

@nebkat
Copy link
Contributor Author

nebkat commented Dec 10, 2024

Ah right - yes, but if anything I would deprecate to_ubjson in favour of to_bjdata considering UBJSON has not seen any meaningful updates in ~8 years. If we can play our part in moving people towards BJData that seems like a positive thing in the long run! It is the closest thing we have to a 1-to-1 binary mapping of JSON without the extra fluff.

@nlohmann
Copy link
Owner

Well, I thought of BJData to be a dialect of UBJSON, so I don't think replacing the UBJSON with those of BJData is a good idea.

@nebkat
Copy link
Contributor Author

nebkat commented Jan 5, 2025

The change in endianness between UBJSON/BJData makes it feel like more than just a dialect to me so I'd fear labelling it as such (and having both use to_ubjson) would cause some confusion if we do not explicitly deprecate UBJSON as a legacy format.

I suppose that applies to my proposal of a ubjson in bjdata_version_t too, so in retrospect I would stick to this version - but let me know if you'd still like a common version enum.

A third option would be a public facing bjdata_version_t as above and a hidden ubjson_dialect_t for use in the common converter functions:

enum class ubjson_dialect_t
{
    ubjson_spec12,
    bjdata_draft2,
    bjdata_draft3,
};

@nlohmann
Copy link
Owner

nlohmann commented Jan 5, 2025

The change in endianness between UBJSON/BJData makes it feel like more than just a dialect to me so I'd fear labelling it as such (and having both use to_ubjson) would cause some confusion if we do not explicitly deprecate UBJSON as a legacy format.

You are right - let's keep the two formats separated in the public API (while they may of course share code in the detail namespace).

I think the idea of a version enum from #4513 (comment) makes sense here in order not to break existing code.

Copy link

github-actions bot commented Jan 5, 2025

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @nebkat
Please read and follow the Contribution Guidelines.

@nebkat nebkat force-pushed the develop branch 3 times, most recently from c9dc364 to 61d1385 Compare January 5, 2025 17:22
Copy link

github-actions bot commented Jan 5, 2025

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @nebkat
Please read and follow the Contribution Guidelines.

@nlohmann
Copy link
Owner

nlohmann commented Jan 5, 2025

Please see https://github.com/nlohmann/json/pull/4513/checks?check_run_id=35168472051 on how to sign off your commits.

Introduces a dedicated `B` marker for bytes. This is used as the strong
type marker in optimized array format to encode binary data such that
it can also be decoded back to binary data (instead of decoding as an
integer array).

See NeuroJSON/bjdata#6 for further information.

Signed-off-by: Nebojsa Cvetkovic <[email protected]>
@nebkat nebkat marked this pull request as ready for review January 5, 2025 19:13
@nebkat
Copy link
Contributor Author

nebkat commented Jan 7, 2025

That unfortunately leaves to_bjdata(j, true, true, nlohmann::json::bjdata_version_t::draft3) looking rather verbose.

Possible alternatives are:

  1. to_bjdata(j, bjdata_config_t { size, type, version } ), nice because we can default version to 3 if unspecified, and the config can be stored in a variable.
  2. to_bjdata(j, use_size, use_type, draft3_binary = false), less room for expansion, though I doubt there will be many more breaking versions.
  3. to_bjdata(version, j, use_size, use_type), at least don't have to specify use_size and use_type every time.
  4. to_bjdata_v3() with to_bjdata() deprecated.
  5. Shortened bjdata_version_t, maybe bjdata::draft3?

@nlohmann
Copy link
Owner

nlohmann commented Jan 7, 2025

That unfortunately leaves to_bjdata(j, true, true, nlohmann::json::bjdata_version_t::draft3) looking rather verbose.

ADL should allow you to write to_bjdata(j, true, true, bjdata_version_t::draft3). Still not nice, but at least shorter.

  1. to_bjdata(j, bjdata_config_t { size, type, version } ), nice because we can default version to 3 if unspecified, and the config can be stored in a variable.

This could be a way for the future - also the dump function currently suffers from too many parameters.

  1. to_bjdata(j, use_size, use_type, draft3_binary = false), less room for expansion, though I doubt there will be many more breaking versions.

I see. But adding yet another function to the public API is also not too nice.

  1. to_bjdata(version, j, use_size, use_type), at least don't have to specify use_size and use_type every time.

I see, though I wonder how many people this actually affects.

  1. to_bjdata_v3() with to_bjdata() deprecated.

No. The public API is already so large.

  1. Shortened bjdata_version_t, maybe bjdata::draft3?

I see your point, but bjdata_version_t::draft3 is expressive and to the point.

I think we should extend the current to_bson function. I will create a discussion to collect the respective options for each serialization function. Maybe some aspects can be shared between the formats. Once we have these in place, we can deprecate the existing APIs and use cleaner ones where all options are expressed in dedicated structs.

What do you think?

@nebkat
Copy link
Contributor Author

nebkat commented Jan 7, 2025

ADL should allow you to write to_bjdata(j, true, true, bjdata_version_t::draft3). Still not nice, but at least shorter.

Didn't realize ADL let you do this, that's not so bad.

I think we should extend the current to_bson function. I will create a discussion to collect the respective options for each serialization function. Maybe some aspects can be shared between the formats. Once we have these in place, we can deprecate the existing APIs and use cleaner ones where all options are expressed in dedicated structs.

Agreed, I think that is the best way forward.

I've reverted back to the commit where nothing is deprecated, so no current users will be affected, and power users (probably just me for the next while) can specify draft3 as last argument explicitly.

Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@nlohmann nlohmann added the aspect: binary formats BSON, CBOR, MessagePack, UBJSON label Jan 7, 2025
@nlohmann nlohmann added this to the Release 3.11.4 milestone Jan 7, 2025
@nlohmann nlohmann merged commit 2e50d5b into nlohmann:develop Jan 7, 2025
244 checks passed
@nlohmann
Copy link
Owner

nlohmann commented Jan 7, 2025

Thanks!!

@nebkat
Copy link
Contributor Author

nebkat commented Jan 7, 2025

Thanks for the assistance!

@nlohmann
Copy link
Owner

nlohmann commented Jan 7, 2025

@nebkat Can you create another PR for the #4513 (review)?

@nebkat nebkat mentioned this pull request Jan 7, 2025
4 tasks
nebkat added a commit to nebkat/nlohmann-json that referenced this pull request Jan 8, 2025
codenut pushed a commit to codenut/json that referenced this pull request Jan 9, 2025
slowriot pushed a commit to slowriot/json that referenced this pull request Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants