Skip to content

(Nearly) Opaque datasets #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Lut99 opened this issue Oct 31, 2024 · 0 comments
Open

(Nearly) Opaque datasets #185

Lut99 opened this issue Oct 31, 2024 · 0 comments
Labels
C-Rework Category: Something that requires updating or replacing existing systems. S-Future-work Status: Things that aren't going to be implemented directly, nor have priority to do so.

Comments

@Lut99
Copy link

Lut99 commented Oct 31, 2024

Currently, in Brane and BraneScript, all datasets are treated as single filesystem entries (file or folder). Brane doesn't know at any level which it is, and if it's a folder, what the contents of it are. All it knows is that it is something that is packed in a tar and that it can be attached to package containers as a volume when unpacked. Packages are responsible for correctly understanding the file or folder they are given.

This is problematic mostly because of the very strong link between datasets and containers which is completely invisible from the workflow level. A container has to be tailored to the file structure of a dataset to understand it, but BraneScript allows any dataset to be used at any point and gives very vague errors when it mismatches. Therefore, it is really easy to make mistakes when matching the two.

The matter is made worse by the way that Brane handles container outputs. Out of pragmatic reasons, if a package has a data output, it mounts a write volume at /result in the container, which is a folder to which a package can write. But this is always a folder. So even something simple like copy_result() is incapable of perfectly copying a dataset if that dataset is a file instead of a folder. This can very easily lead to confusion.

In general, Brane should be able to neatly support different types of datasets (also non-files, but maybe streams of API endpoints) in a way that allows Brane to determine beforehand whether they are compatible. Further, outputs should also be appropriately typed and, ideally, support the same kinds as inputs.

Fixing this problem is future work, as it may be a hard one to tackle (cross-stack changes).

@Lut99 Lut99 added C-Rework Category: Something that requires updating or replacing existing systems. S-Future-work Status: Things that aren't going to be implemented directly, nor have priority to do so. labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Rework Category: Something that requires updating or replacing existing systems. S-Future-work Status: Things that aren't going to be implemented directly, nor have priority to do so.
Projects
None yet
Development

No branches or pull requests

1 participant