Docs and API follow-ups to #601 #619

TomNicholas · 2025-06-17T12:11:47Z

This started out as a targeted PR to address #616 and ended up as an attempt to address all the uncheck bullets from #601 (i.e. everything in the docs that touches the concept of parsers).

Follow up to Refactor codebase to support a new simplified Parser->ManifestStore model. #601
~~Tests added~~
Tests passing
Full type hint coverage
Changes are documented in docs/releases.rst
New functions/methods are listed in api.rst
New functionality has documentation

fyi @sharkinsspatial @maxrjones @chuckwondo

for more information, see https://pre-commit.ci

TomNicholas · 2025-06-17T12:12:41Z

docs/custom_readers.md

Don't know why git mv didn't understand that I was just renaming this file. (It has a bunch of other changes too.)

TomNicholas · 2025-06-17T12:13:33Z

docs/custom_parsers.md

+def custom_parser(file_url: str, object_store: ObjectStore) -> ManifestStore:
+    # access the file's contents, e.g. using the ObjectStore instance
+    readable_file = obstore.open_reader(object_store, file_url)
+
+    # parse the file contents to extract its metadata
+    # this is generally where the format-specific logic lives
+    manifestgroup: ManifestGroup = extract_metadata(readable_file)
+
+    # optionally create an object store registry, used to actually load chunk data from file later
+    registry = ObjectStoreRegistry({store_prefix: object_store})
+
+    # construct the Manifeststore from the parsed metadata and the object store registry
+    return ManifestStore(group=manifestgroup, store_registry=registry)


Writing this out made me realize it's a bit weird that exactly one ObjectStore is required by the call signature, but not actually technically needed by the code...

maxrjones · 2025-06-17T19:49:57Z

This started out as a targeted PR to address #616 and ended up as an attempt to address all the uncheck bullets from #601 (i.e. everything in the docs that touches the concept of parsers).

IMO more targeted PRs are preferable because they are simpler to review, faster to merge, and keep the git log more descriptive. This number of files touched by this PR motivated my request for a faster turnaround for #615 in #615 (comment).

TomNicholas · 2025-06-17T20:06:38Z

That's fair - I can definitely split out the changes to where the Parser is defined from the rest of the changes. But the docs do just need altering on almost every page.

TomNicholas · 2025-06-18T05:34:00Z

Okay I've split that out in #621, which should be merged first. The rest of this PR basically does a few (related) things:

grep for "reader" and replace with "parser"
modify any language referring to parsers / ManifestStore to be up-to-date with the changes in Refactor codebase to support a new simplified Parser->ManifestStore model. #601
update code examples to use parser and obstore instead of reader

Those could be separated further if it would help, but it's now already (almost) down to being a pure docs (including docs examples) PR.

…aliZarr into parsers.typing

chuckwondo

Great documentation! This really helped me start to wrap my head around things.

Most of my suggestions are minor format/syntax/grammar suggestions, but there are also a few regarding use of context managers in examples, and a naming question (which would be best addressed in a separate PR, if is makes sense).

docs/custom_parsers.md

chuckwondo · 2025-06-18T10:38:52Z

docs/custom_parsers.md

+vds = vz.open_virtual_dataset(
+    file_url,
+    object_store=object_store,
+    parser=custom_parser,
+)


I suggest we use context managers in examples to show recommended usage to ensure resources are properly managed to avoid leaks:

Suggested change

vds = vz.open_virtual_dataset(

file_url,

object_store=object_store,

parser=custom_parser,

)

with vz.open_virtual_dataset(

file_url,

object_store=object_store,

parser=custom_parser,

) as vds:

...

Tangentially, can we rename the object_store parameter to simply store? That would be consistent with the names store_prefix and store_registry elsewhere.

However, would that then cause potential confusion with zarr.abc.Store? If so, then wouldn't store_prefix and store_registry also cause confusion about what type of store they are related to (obstore or zarr)?

On context managers: Do we really need to? It makes all the examples more complex to read...

On renaming: I agree this is potentially confusing. I think I would prefer everything be object_store, but then on the other hand we do have type hints to help disambiguate... Doesn't help that zarr.storage.ObjectStore is a zarr.abc.Store that wraps an obstore.Store 🙃 Is the word object redundant in any way? Might we want to generalize that later?

Adding context managers would certainly add a minor amount of complexity to the examples, but my fear is that most readers of any code examples (regardless of library) tend to repeat the same patterns, even if those patterns are likely not ideal for production code. How many context managers have I already had to add to the codebase itself to resolve problems (both in main code and test code)?

At the very least, I recommend a very obvious, bold warning in at least one place in the docs (ideally somewhere most readers are likely to see) that very clearly indicates that use of context managers is recommended for production code, but for brevity, code examples will not use them. And the callout should show an explicit example of the recommended practice, so that the syntax is visually imprinted in the reader's mind.

My preference is to make repeated use of context managers throughout the examples, so that the repetition is imprinted in the reader's mind, and will be the syntax they repeat, rather than repeatedly not using context managers.

Even with a big, bold warning somewhere in the docs, I suspect the reader will repeat what they see, not what the warning says, because that's what they would repeatedly see in the examples. I recommend repeating the recommended practice, not repeating the "poor" practice simply for saving a modicum of keystrokes/simplification.

Of course, if I'm outvoted, I won't block things.

That's totally reasonable. My only remaining concern is that it's tricker to do that in narrative documentation than in real code, because I need text between opening the virtual dataset and using the virtual dataset. But this isn't going to work if users copy it verbatim:

with open_virtual_dataset() as vds: ...

some explanatory text

vds.virtualize.to_kerchunk()

In the docs I can't really wrap all later uses of vds inside the context manager, unless I keep opening it again and again, which also wouldn't be very clear. It feels like a compromise either way.

FWIW all your arguments could apply to the xarray documentation too, but they don't use context managers there either

https://docs.xarray.dev/en/stable/user-guide/io.html#reading-and-writing-files

Fair point about interleaving prose with code. Perhaps we can at least find a good place to put a callout explaining that use of context managers is strongly recommended to prevent memory/resource leaks in critical code (along with a code example), but that for convenience throughout the docs, context managers might be dropped.

chuckwondo · 2025-06-18T10:40:19Z

docs/custom_parsers.md

+    manifestgroup: ManifestGroup = extract_metadata(readable_file)
+
+    # optionally create an object store registry, used to actually load chunk data from file later
+    registry = ObjectStoreRegistry({store_prefix: object_store})


store_prefix is undefined. Do we want to add a line or 2 of code/comment about it, or at least a comment referring users to a section of the docs covering registries?

docs/custom_parsers.md

chuckwondo · 2025-06-18T12:03:02Z

docs/usage.md


-vds = open_virtual_dataset('air.nc')
+vds = open_virtual_dataset('air.nc', object_store=LocalStore, parser=HDFParser())


context manager?

also, a few lines above, I suggest using a context manager for opening the air_temperature tutorial dataset

chuckwondo · 2025-06-18T12:03:53Z

docs/usage.md

 vds = open_virtual_dataset(
    'relative_refs.json',
-    filetype='kerchunk',
-    virtual_backend_kwargs={'fs_root': 'file:///some_directory/'}
+    object_store=LocalStore,
+    parser=KerchunkJSONParser(
+        fs_root='file:///data_directory/',
+    )
 )


context manager?

for more information, see https://pre-commit.ci

Co-authored-by: Chuck Daniels <[email protected]>

TomNicholas added 11 commits June 17, 2025 14:51

move Parser definition to new parser.typing module

e605177

add API docs for Parser protocol and parser classes

df58929

avoid extra .hdf namespace for only one parser

7278819

rename reader -> parser

af453e6

update custom parsers page

fb0ce2e

update usage docs

01c206b

update roadmap to reflect where we actually are

d953b94

update faq

120d223

note about the renaming of readers->parsers

8e0a0af

minor qualification

e828058

release notes

a55b9d2

TomNicholas added the documentation Improvements or additions to documentation label Jun 17, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

629f209

for more information, see https://pre-commit.ci

pre-commit-ci bot had a problem deploying to test-release June 17, 2025 12:12 Failure

TomNicholas commented Jun 17, 2025

View reviewed changes

maxrjones mentioned this pull request Jun 17, 2025

Use mkdocs-material for documentation #615

Merged

7 tasks

TomNicholas mentioned this pull request Jun 18, 2025

Move Parser definition to new parser.typing module #621

Merged

7 tasks

TomNicholas added 2 commits June 18, 2025 12:47

change import

2ddc1f9

Merge branch 'parsers.typing' of https://github.com/TomNicholas/Virtu…

53294e9

…aliZarr into parsers.typing

TomNicholas temporarily deployed to test-release June 18, 2025 05:47 — with GitHub Actions Inactive

ignore lint

c1e9fbf

TomNicholas temporarily deployed to test-release June 18, 2025 05:50 — with GitHub Actions Inactive

chuckwondo requested changes Jun 18, 2025

View reviewed changes

Merge branch 'develop' into parsers.typing

a414468

TomNicholas temporarily deployed to test-release June 20, 2025 05:40 — with GitHub Actions Inactive

[pre-commit.ci] auto fixes from pre-commit.com hooks

c4a4d16

for more information, see https://pre-commit.ci

pre-commit-ci bot temporarily deployed to test-release June 20, 2025 05:40 Inactive

Nits from Chunk's review

8b1cfb7

Co-authored-by: Chuck Daniels <[email protected]>

TomNicholas temporarily deployed to test-release June 20, 2025 07:23 — with GitHub Actions Inactive


		vds = open_virtual_dataset('air.nc')
		vds = open_virtual_dataset('air.nc', object_store=LocalStore, parser=HDFParser())

Docs and API follow-ups to #601 #619

Are you sure you want to change the base?

Docs and API follow-ups to #601 #619

Conversation

TomNicholas commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomNicholas Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxrjones commented Jun 17, 2025

Uh oh!

TomNicholas commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomNicholas commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chuckwondo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TomNicholas commented Jun 17, 2025 •

edited

Loading

TomNicholas Jun 17, 2025 •

edited

Loading

TomNicholas commented Jun 17, 2025 •

edited

Loading

TomNicholas commented Jun 18, 2025 •

edited

Loading

chuckwondo left a comment •

edited

Loading

chuckwondo Jun 18, 2025 •

edited

Loading