-
Notifications
You must be signed in to change notification settings - Fork 453
Add in-memory I/O #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in-memory I/O #417
Conversation
|
||
if(!fp->buffer_is_mine) | ||
{ | ||
fprintf(stderr, "[E::mem_file] Cannot write to %s -- I don't own the buffer and can only read.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This low level of htslib reports errors via errno
rather than polluting stderr. So use errno = EINVAL;
or similar, and maybe log the explanation with if (hts_verbose >= 5)
or so if that failure is going to be confusing to the programmer.
{ | ||
new_buffer_size = (fp->offset + nbytes + 1023) & round_mask; | ||
tmp = realloc(fp->buffer, new_buffer_size) ; | ||
if(!tmp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, realloc
failure has already set errno
to ENOMEM, so all this needs is if (tmp == NULL) return -1;
.
} | ||
|
||
static const struct hFILE_backend mem_backend = { | ||
mem_read, mem_write, mem_seek, mem_flush, mem_close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just put NULL
in the flush slot instead of having a dummy mem_flush()
function.
size_t len; | ||
|
||
const char *realfilename = strchr(filename, ':') + 1; | ||
if(!realfilename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too late! You've already added 1 to NULL
.
Thanks for sharing the code, the details make what you were describing at the formats meeting clearer 😄 I've made a few specific comments on the code, but here's some more high-level comments. It appears that
For another backend ( |
0b0b743
to
3e6d65c
Compare
3e6d65c
to
71a42f6
Compare
Thanks for the feedback! I have added some changes based on your code review, please let me know if these are ok or need more work. About the use cases: (2) / reading a file into memory and closing the handle could probably be removed from this would duplicate the (3) The idea is to retrieve the buffer afterwards and not write things to disk -- the application for this is to have multiple threads write parts of a VCF/BCF file which stay in memory and can then be concatenated and written out in order as they become available. |
The “fixed buffer” construct and a varargs form of
These two I think provide a better interface to functionality similar to the PR's proposed
would tell the fixed-buffer hFILE to (also) update these pointers to the backing buffer; or an ioctl-like function like |
In terms of observing the buffer, I think that while ugly, a solution with Maybe having buffers as separate entities would solve the question of ownership, i.e. having something like |
I haven't investigated this yet so maybe it's already there, but also consider the setvbuf interface from stdio. Normally fopen/fclose allocates and deallocates its own buffers, but it is possible to supply your own in which case memory management is up to you. |
Hi @pkrusche, I'm working on something that could do with this in htslib. What's the status of this pull request and is there anything that I could do to help with getting this merged? |
Implements what is proposed in samtools#417
Implements what is proposed in samtools#417
Closed as replaced by #590 (now merged). |
These changes allow to do in-memory I/O as described during the last GA4GH fileformats meeting.
The idea is to add a backend which can read/write in-memory buffers (e.g. for multi-threading or to work around file handle limits).
Here are slides some explaining how this works and what it is useful for:
https://docs.google.com/presentation/d/1xTcTFrWuDYqMgSIEk3qEucsRGH2JaElgpYeYd9IGvbE/edit?usp=sharing
It would be great to have this functionality bundled with htslib!