(Nearly) Opaque datasets #185
Labels
C-Rework
Category: Something that requires updating or replacing existing systems.
S-Future-work
Status: Things that aren't going to be implemented directly, nor have priority to do so.
Currently, in Brane and BraneScript, all datasets are treated as single filesystem entries (file or folder). Brane doesn't know at any level which it is, and if it's a folder, what the contents of it are. All it knows is that it is something that is packed in a tar and that it can be attached to package containers as a volume when unpacked. Packages are responsible for correctly understanding the file or folder they are given.
This is problematic mostly because of the very strong link between datasets and containers which is completely invisible from the workflow level. A container has to be tailored to the file structure of a dataset to understand it, but BraneScript allows any dataset to be used at any point and gives very vague errors when it mismatches. Therefore, it is really easy to make mistakes when matching the two.
The matter is made worse by the way that Brane handles container outputs. Out of pragmatic reasons, if a package has a data output, it mounts a write volume at
/result
in the container, which is a folder to which a package can write. But this is always a folder. So even something simple likecopy_result()
is incapable of perfectly copying a dataset if that dataset is a file instead of a folder. This can very easily lead to confusion.In general, Brane should be able to neatly support different types of datasets (also non-files, but maybe streams of API endpoints) in a way that allows Brane to determine beforehand whether they are compatible. Further, outputs should also be appropriately typed and, ideally, support the same kinds as inputs.
Fixing this problem is future work, as it may be a hard one to tackle (cross-stack changes).
The text was updated successfully, but these errors were encountered: