Skip to content

Zarr data types refactor compatibility #618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

TomNicholas
Copy link
Member

Supercedes #545 now that zarr-developers/zarr-python#2874 has been merged upstream.

The upstream changes are already being tested (and unsurprisingly causing some issues - see also #617) in our upstream dev CI tests. But here we will x-fail stuff, start trying to fix things, and bump required dependencies.

  • Closes #xxxx
  • Tests added
  • Tests passing
  • Full type hint coverage
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst
  • New functionality has documentation

FYI @d-v-b

@TomNicholas TomNicholas added the zarr-python Relevant to zarr-python upstream label Jun 17, 2025
@TomNicholas TomNicholas added test-upstream Run the upstream tests on this PR and removed test-upstream Run the upstream tests on this PR labels Jun 17, 2025
@TomNicholas
Copy link
Member Author

@d-v-b what would be the preferred way to replace the usage of V3JsonEncoder (and the older dict_to_buffer that we apparently vendored in https://github.com/zarr-developers/VirtualiZarr/blob/develop/virtualizarr/vendor/zarr/core/metadata.py?

@d-v-b
Copy link

d-v-b commented Jun 17, 2025

The functionality for serializing Nan / Inf / -Inf to a fill value now sits in stand-alone functions . The inverse functions are here.

A few things to keep in mind:

  • Zarr v3 float serialization is a little different from v2. Zarr v3 allows floats to use a hex string to represent the float value, which is necessary for supporting different types of nans. In zarr-python, we only care about this on the from-json side, because as long as we are using numpy, we can't tell the difference between these different nan values, so there's no need for serialization logic to handle that.
  • Because your V3JSONEncoder is used as a json encoder, it's applied to the entire zarr.json document, which is not in general correct. The zarr specs only define special JSON encoding in the context of the fill_value field. There is no expectation that the same special encoding applies to user-defined attributes.
  • Instead of using a special JSON encoder to transform a potentially malformed dict, you could just create JSON-serializable, spec-compliant dict in advance. This is what ArrayV3Metadata.to_dict now does in zarr python, thanks to smarter dtypes. So I would just do that instead of using a custom JSON encoder that's potentially going to mangle the contents of user attributes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-upstream Run the upstream tests on this PR zarr-python Relevant to zarr-python upstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants