Description
Background
So recently in lnd
we had an inadvertent regression in the performance of our gossip message processing (never made it to a release, was caught in an rc). A few release backs, we started to actually enforce ping responses. So if a peer didn't respond to a ping in time with a pong response, we'd disconnect them. Initially we didn't realize there was a perf regression, and we just saw that many peers were now failing the ping liveness check.
TCP Head of Line Blocking
What happened was that we started to dramatically limit some internal message buffering, combined with the dramatic regression in gossip ingestion pace, messages were processed more slowly, and would then sit in an internal queue much longer. Once that queue filed out, the main thread that read messages off the wire would block.
If this thread blocks, then eventually the TCP buffers would fill up, and the connected peer would in turn block themselves. So even though a peer was still processing messages, they could still fail our pick check if: the ping message was stuck behind a flood of gossip messages, or the had a pong response, but we hadn't processed it yet.
This phenomena is typically referred to as head-of-the-line-blocking, it can happen at the application level, but its also inherent to the way TCP works, as it implements a reliable in-order stream abstraction.
After a bit of wack-a-mole, we found the performance regression, and reverted it. However if you come up with some arbitrary numbers for: the size of the incoming gossip queue, the message arrival rate, and the processing latency -- it's easy to come up with a combo that leads to a ping/pong message being "stuck" behind other messages, even at the kernel/TCP level.
That got me thinking: what if we actually added a QUIC transport mode?
P2P QUIC Transports & Concurrent Streams
QUIC unlike TCP, is UDP based. It moves the congestion control algorithm into user space, whereas for TCP all that logic lives firmly in the kernel. Part of the rationale here is that user space can evolve more quickly than kernel space can, which can help stave off ossification and promote experimentation.
In a p2p setting like ours, one interesting component of QUIC is the concept of streams. A single connection can have multiple streams, each stream has flow control handled separately from other streams. This serves to effectively isolate the QoS of a given stream. If one stream is moving slowly, that doesn't block any of the other stream. In the LN setting, this means that for a given channel/connection, we could have a stream for: gossip messages, ping/pongs, channel state machine updates, concurrent splices, etc, etc.
A TCP connection is effectively a single logical stream/queue. If the other side stops ACK'ing your packets, then eventually you'll just stop and wait to send more. Here we see a manifestation of head-of-the-line-blocking. Imagine we have an HTLC that we're ready to send out to complete a payment circuit, as the stream is processed linearly, that update_add_htlc
message could get stuck behind a flood of channel_update
messages. This directly impacts perceived payment latency on the network.
If a QUIC transport were to be added, then for a given channel/connection, we could have a stream for: gossip messages, ping/pongs, channel state machine updates, concurrent splices, etc, etc. Returning to the scenario above, slow processing of gossip messages would necessarily block sending out pings or normal channel update messages, etc.
Other advantages include a fast 1-RTT (and even R-TT) connection handshake, more seamless connection migration (connection ID is the main identifier vs the normal TCP 4-tuple), easier hole punching, etc.
Sketch of Change Across BOLT 8 & BOLT 7
If we wanted to add an option for a QUIC transport, one obvious area that would need extensions is the node_announcement
message as defined in BOLT 7. We'd want to add an ability to signal that the peer supported a QUIC transport.
One complication with QUIC, is that AFAICT, there's no way to run without TLS 1.3. Today we run over plain TCP, then layer the noise protocol on top. As a result, AFAICT we'd just have to layer BOLT 8 on top (double encryption) and come up with some sort of mock TLS cert/scheme.
We'd likely also add some suggestion to the spec re how an implementation should handle stream handling, etc.
Potential Roadblocks
After all these years (QUIC was first publicly announced in 2013, IETF RFC published in 2015 ish) first-class support amongst popular programming languages doesn't seem to be very wide spread. For example, Go has an implementation in the stdlib, but it's under the /x
prefix, which means it's experimental. There's also quite a lot of the underlying protocol that hasn't been implemented yet in the package.
There's also the whole TLS 1.3 requirement. I haven't dug super deep, but it appears that TLS is pretty intertwined w/ the protocol. It's possbile that there're lower level libraries that let you use QUIC w/o the TLS layer (eg:run in an insecure/debug mode), but further investigation is needed. QUIC connection establishment has less RTTs then TCP, so we'd have less RTT overall even combining the Noise handshake overhead.
AFAICT, it should be possible to just use raw public keys in place of self signed certs.