-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Seeker can't seek
when referencing data-containing node
#4286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that the CLI appears to work fine:
|
The error is caused by https://github.com/ipfs/go-ipfs/blob/master/unixfs/io/pbdagreader.go#L67, which is called by https://github.com/ipfs/go-ipfs/blob/master/unixfs/io/pbdagreader.go#L222, which is called by gateway/http logic (That's why CLI works). It looks like your data has invalid size hints:
Can you share how you created those nodes? |
@magik6k I am working on a project that uses the IPFS infrastructure for retrieval/display purposes only. I am using a separate block generator written from scratch in another language, that does not have anything in common with the ipfs codebase (it is not in a state where it is easily open-sourceable). I use its output to feed to the API as per #3552 (comment) Back when I was working on my first large-scale exploratory test #3552 (comment), someone on IRC specifically told me that sizing is optional, and is essentially not enforced anywhere. In fact everything in that matrix does not have a size on the leaves at all. In all the blocks I generate ( as you can see in the graph) omit many of the fields. Namely this is never specified, neither is this except for directories, and where it is specified the size corresponds to the actual data bytes underneath, to provide actual user-friendly representation within a directory listing So far pretty much everything works as advertised: the size thing is treated as entirely optional, and is used for nothing but display purposes, except of course the bug raised in this particular ticket. Now - if go-ipfs did a 180 and declared retroactively that all these sizes are in fact mandatory, I would be sad ( because all these examples I have above will no longer be browseable ), but at least there will be clarity how one can move forward. Though keep in mind that there is no way to prevent "invalid" blocks like mine being placed into the DHT, so the UI needs to be better at erroring out if this is the direction you want to move. Additionally keep in mind that go-ipfs itself will happily create bogus values, e.g. #3440 and #4151 Additionally - the way my project chunks data, my average blocksize is around 4096 bytes ( with a hard-limit on block sizes at 64k ), so forgoing any optional metadata is critical for me in order to reduce the size of my objects ( as the overhead adds up incredibly fast when one deals with hundreds of millions of objects in a DAG ). |
@mib-kd743naq The sizes are optional, but seeking through a file without the sizes set will throw an error (as youre seeing). The issue here appears to be that go's http.ServeContent uses seeks to determine the size of the object its serving, which then triggers our issue. See: https://golang.org/pkg/net/http/#ServeContent This is somewhat troublesome, it would be nice to only call seek if a range header is set, instead of calling seek for every call. This actually might be causing performance issues when serving content from the gateway, as it has to call Seek (which requires fetching blocks) before it can serve the first byte. Looking at the code for ServeContent, its a wrapper around an internal I'm mostly out this week, but I can take a deeper look into resolving this when i'm back. Aside from fixing your issue, this will likely improve gateway perf quite nicely. |
So much this!!! I was actually looking into strangely odd performance issues with my HTTP requests but was never able to dig too far. I will be having a caching proxy in front of go-ipfs, so the inability to seek to a range is not a problem. However, given that you confirmed sizing is optional, perhaps a "dumb seek via skip" mode isn't a bad thing to have for DAGs like mine. |
The hard part about that is knowing when to switch to dumb seek mode. I
guess if there aren't *any* sizes set, then do that. And then if some bad
sizes are set, try to error out.
…On Thu, Oct 12, 2017, 5:52 PM Mib Kd743naq ***@***.***> wrote:
This is somewhat troublesome, it would be nice to only call seek if a
range header is set
So much this!!! I was actually looking into strangely odd performance
issues with my HTTP requests but was never able to dig too far.
I will be having a caching proxy in front of go-ipfs, so the inability to
seek to a range is not a problem. However, given that you confirmed sizing
is optional, perhaps a "dumb seek via skip" mode isn't a bad thing to have
for DAGs like mine.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4286 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABL4HLSHSobKE6Ivz6pX0PQF3T2razjwks5srkPEgaJpZM4PwVsu>
.
|
Which is precisely how I construct my File / Chunk ( UnixFS types 2 and 0 ) blocks: they entirely lack both the linksize field and the blocklist fields |
The
|
So looking at HTTP spec |
@whyrusleeping just a gentle poke/reminder that you "promised" to dig into this a bit Monday ;) |
@mib-kd743naq right you are! Investigating now |
its looking like we should implement our own variant of the http.ServeContent function that lets us skip certain things. I'll try and get a PR up for this soon. |
Version information:
All versions affected
Type:
Bug
Severity:
High
Description:
I put together a demonstration of the problem at this location: if you try to look at the last entry in that directory, the following is observed:
A complete diagram of all blocks involved and their contents is included at https://ipfs.io/ipfs/QmZL4wivfYBEUutxksKgbpPfFkj1tGNMESYLFF2MWcST8Y/block_dag.svg
The text was updated successfully, but these errors were encountered: