Add in-memory I/O #417

pkrusche · 2016-09-27T08:00:55Z

These changes allow to do in-memory I/O as described during the last GA4GH fileformats meeting.

The idea is to add a backend which can read/write in-memory buffers (e.g. for multi-threading or to work around file handle limits).

Here are slides some explaining how this works and what it is useful for:

https://docs.google.com/presentation/d/1xTcTFrWuDYqMgSIEk3qEucsRGH2JaElgpYeYd9IGvbE/edit?usp=sharing

It would be great to have this functionality bundled with htslib!

jmarshall · 2016-10-06T16:01:43Z

hfile_mem.c

+
+    if(!fp->buffer_is_mine)
+    {
+        fprintf(stderr, "[E::mem_file] Cannot write to %s -- I don't own the buffer and can only read.\n",


This low level of htslib reports errors via errno rather than polluting stderr. So use errno = EINVAL; or similar, and maybe log the explanation with if (hts_verbose >= 5) or so if that failure is going to be confusing to the programmer.

jmarshall · 2016-10-06T16:03:07Z

hfile_mem.c

+    {
+        new_buffer_size = (fp->offset + nbytes + 1023) & round_mask;
+        tmp = realloc(fp->buffer, new_buffer_size) ;
+        if(!tmp)


Similarly, realloc failure has already set errno to ENOMEM, so all this needs is if (tmp == NULL) return -1;.

jmarshall · 2016-10-06T16:04:17Z

hfile_mem.c

+}
+
+static const struct hFILE_backend mem_backend = {
+        mem_read, mem_write, mem_seek, mem_flush, mem_close


Can just put NULL in the flush slot instead of having a dummy mem_flush() function.

jmarshall · 2016-10-07T02:23:36Z

hfile_mem.c

+    size_t len;
+
+    const char *realfilename = strchr(filename, ':') + 1;
+    if(!realfilename)


Too late! You've already added 1 to NULL.

jmarshall · 2016-10-07T04:15:45Z

Thanks for sharing the code, the details make what you were describing at the formats meeting clearer 😄 I've made a few specific comments on the code, but here's some more high-level comments.

It appears that hopen_mem() adds support for three separate things:

The use-externally-provided-memory-buffer @filename functionality discussed at the meeting;
Reading in a local file as per the hFILE_fd backend, but loading the whole file into memory during hopen() rather than holding a file descriptor open while the resulting hFILE is open;
Creating a new file in a memory buffer (though this seems incomplete — I half-expected it to write the file out during mem_close(), but perhaps this is to be used with hfile_mem_get_buffer()?).

For another backend (data:), I'm considering adding support to base hFILE's buffering for a “fixed buffer” wherein the base hFILE's buffer/begin/end/limit contains the entire file contents rather than using backend read/write methods to fill and empty it. This simplifies some things and saves pointlessly memcopying between the backend in-memory buffer and the base buffer. So if hFILE_fd gained an option to do this, that would provide the (2) and possibly also (3) functionality, leaving a new hFILE_mem free to concentrate on (1).

pkrusche · 2016-10-07T19:40:41Z

Thanks for the feedback! I have added some changes based on your code review, please let me know if these are ok or need more work.

About the use cases:

(2) / reading a file into memory and closing the handle could probably be removed from this would duplicate the data backend functionality.

(3) The idea is to retrieve the buffer afterwards and not write things to disk -- the application for this is to have multiple threads write parts of a VCF/BCF file which stay in memory and can then be concatenated and written out in order as they become available.

jmarshall · 2017-06-06T14:57:42Z

The “fixed buffer” construct and a varargs form of hopen() have since landed in HTSlib. This means that the functionality desired in this PR could be implemented perhaps more elegantly and maintainably using these facilities. Thus:

Open a file using an externally-provided-memory-buffer via something like
```
 hopen("mem:", "r:", "buffer", ptr, len)
```
which would wrap a fixed-buffer hFILE around the provided buffer.
Preloading a file could be arranged with hopen(filename, "r:", "preload", 1) or so, which would tell hopen_fd (or a new function alongside it) to create a fixed buffer instead of the current mobile buffer.

These two I think provide a better interface to functionality similar to the PR's proposed lookup_function stuff.

Retrieving a buffer after using an hFILE to write to it: it seems like the best way to implement this would be to fully implement writing for fixed buffers. Then something like
```
 hopen("mem:", "w:", "buffer-ref", &ptr, &len)
```

would tell the fixed-buffer hFILE to (also) update these pointers to the backing buffer; or an ioctl-like function like hfile_mem_get_buffer() could report them after the fact — it's unclear which is uglier. There are buffer ownership issues to be worked out — should the buffer be observable all the time, or only after hflush(), or only after hclose()?

pkrusche · 2017-06-07T17:15:32Z

In terms of observing the buffer, I think that while ugly, a solution with hfile_mem_get_buffer might be a good solution -- this function could make sure the buffer is observable until the next write (when it might need to be reallocated).

Maybe having buffers as separate entities would solve the question of ownership, i.e. having something like hfile_mem_free_buffer to free the buffer after the file was closed (s.t. users can call hts_close, which might do non-trivial flushing / writing before calling hclose?). Another idea could be a hclose callback that defaults to free()?

jkbonfield · 2017-06-07T21:41:01Z

I haven't investigated this yet so maybe it's already there, but also consider the setvbuf interface from stdio. Normally fopen/fclose allocates and deallocates its own buffers, but it is possible to supply your own in which case memory management is up to you.

ThomasHickman · 2017-08-31T09:35:51Z

Hi @pkrusche, I'm working on something that could do with this in htslib. What's the status of this pull request and is there anything that I could do to help with getting this merged?

Implements what is proposed in samtools#417

jkbonfield · 2017-12-14T10:43:35Z

Closed as replaced by #590 (now merged).

jmarshall reviewed Oct 7, 2016

View reviewed changes

pkrusche force-pushed the with-inmemory-io branch from 0b0b743 to 3e6d65c Compare October 7, 2016 19:28

Add in-memory I/O

71a42f6

pkrusche force-pushed the with-inmemory-io branch from 3e6d65c to 71a42f6 Compare October 7, 2016 19:31

mp15 mentioned this pull request Mar 14, 2017

Read a Bam file from a ByteArray #489

Open

daviesrob added this to the 1.5 milestone May 23, 2017

daviesrob assigned jkbonfield Jun 5, 2017

jkbonfield mentioned this pull request Jun 6, 2017

Use BGZF* in cram_index_build() instead of zfp* #553

Merged

ThomasHickman added a commit to wtsi-hgi/htslib that referenced this pull request Sep 8, 2017

Add mem and preload scheme handlers

c3a1d8c

Implements what is proposed in samtools#417

ThomasHickman mentioned this pull request Sep 8, 2017

Abstract ref_to_m5 from cram_io #589

Open

ThomasHickman added a commit to wtsi-hgi/htslib that referenced this pull request Sep 11, 2017

Add mem and preload scheme handlers

9bbe101

Implements what is proposed in samtools#417

ThomasHickman mentioned this pull request Sep 11, 2017

Add in-memory I/O using hFILE fixed buffers #590

Closed

jkbonfield closed this Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add in-memory I/O #417

Add in-memory I/O #417

Uh oh!

pkrusche commented Sep 27, 2016

Uh oh!

jmarshall Oct 6, 2016

Uh oh!

jmarshall Oct 6, 2016

Uh oh!

jmarshall Oct 6, 2016

Uh oh!

jmarshall Oct 7, 2016

Uh oh!

jmarshall commented Oct 7, 2016

Uh oh!

pkrusche commented Oct 7, 2016

Uh oh!

jmarshall commented Jun 6, 2017

Uh oh!

pkrusche commented Jun 7, 2017

Uh oh!

jkbonfield commented Jun 7, 2017

Uh oh!

ThomasHickman commented Aug 31, 2017

Uh oh!

jkbonfield commented Dec 14, 2017 •

edited

Loading

Uh oh!

Uh oh!

Add in-memory I/O #417

Add in-memory I/O #417

Uh oh!

Conversation

pkrusche commented Sep 27, 2016

Uh oh!

jmarshall Oct 6, 2016

Choose a reason for hiding this comment

Uh oh!

jmarshall Oct 6, 2016

Choose a reason for hiding this comment

Uh oh!

jmarshall Oct 6, 2016

Choose a reason for hiding this comment

Uh oh!

jmarshall Oct 7, 2016

Choose a reason for hiding this comment

Uh oh!

jmarshall commented Oct 7, 2016

Uh oh!

pkrusche commented Oct 7, 2016

Uh oh!

jmarshall commented Jun 6, 2017

Uh oh!

pkrusche commented Jun 7, 2017

Uh oh!

jkbonfield commented Jun 7, 2017

Uh oh!

ThomasHickman commented Aug 31, 2017

Uh oh!

jkbonfield commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jkbonfield commented Dec 14, 2017 •

edited

Loading