-
Notifications
You must be signed in to change notification settings - Fork 2.2k
pathfinding: capacity-dependent apriori model probability #6857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pathfinding: capacity-dependent apriori model probability #6857
Conversation
5791f71
to
4d90dd8
Compare
4d90dd8
to
39ad143
Compare
Added a few more unit test cases, fixed a failing itest and made only the apriori hop probability capacity dependent. |
39ad143
to
9a7b003
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatively compact pr with a large potential impact. No major comments. Mainly that it might be possible to optimize the commit structure for fewer rewrites. Hold off the addition of the capacity
parameter until everything is in place. And I think some commits are more readable when squashed.
b9c3d90
to
0d33c19
Compare
Thank you for the quick review! I put the capacity calculation into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement with the commit structure 👍
0d33c19
to
05e2552
Compare
05e2552
to
60460ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @bitromortac
Thank you very much for taking the time of rewriting the commits in an easy way, this is my first time digging into this part of the code and it was pretty easy to follow 🥇
@@ -223,6 +224,18 @@ func (u *unifiedPolicy) getPolicyNetwork( | |||
continue | |||
} | |||
|
|||
// Track the maximal capacity for usable channels. If we don't | |||
// know the capacity, we fall back to MaxHTLC. | |||
capMsat := lnwire.NewMSatFromSatoshis(edge.capacity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: when do we have no idea about the capacity of a channel? Unannounced channels from invoices with hop hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the case for hop hint channels, yes, as well as for neutrino nodes, which don't have the capacity for channels, as they would have to query each UTXO in the graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a neutrino node is running w/ the default "assume chan valid", then they won't have this information.
|
||
// We relax the accuracy for the probability check because of the | ||
// capacity cutoff factor. | ||
require.InDelta( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 🎲
routing/graph.go
Outdated
func (g *CachedGraph) FetchAmountPairCapacity(nodeFrom, nodeTo route.Vertex, | ||
amount lnwire.MilliSatoshi) (btcutil.Amount, error) { | ||
|
||
// For the local node we assume no information on the channel capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? And is it the right point to do this check? From a distance it looks like a pair capacity could still be returned for local channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I removed it from the function, for this we already have the local probability calculation and it is a layer violation.
routing/graph.go
Outdated
|
||
// We may not have all policies available to describe the hop between | ||
// the nodes (in the case of hop hints), which is why we return 0 in | ||
// this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment requires slightly more explanation. Why is it necessary to return 0 in case of hop hints? Maybe the explanation belongs in a different layer, because isn't CachedGraph
purely about caching the graph without taking into account higher level usage patterns that might involve policies obtained from other sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it is not the right place to have it here. I added a fixup commit with a solution how this could be handled closer to the rpc level, please have a look. I'm not sure if this is the best approach of returning errors here.
60460ab
to
ca541fe
Compare
3a6c985
to
58d4deb
Compare
unifierFilled.addPolicy(fromNode, &p1, c1) | ||
unifierFilled.addPolicy(fromNode, &p2, c2) | ||
|
||
tests := []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -50,9 +50,10 @@ require ( | |||
go.etcd.io/etcd/client/pkg/v3 v3.5.0 | |||
go.etcd.io/etcd/client/v3 v3.5.0 | |||
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519 | |||
golang.org/x/net v0.0.0-20211015210444-4f30a5c0130f | |||
golang.org/x/exp v0.0.0-20221111094246-ab4555d3164f | |||
golang.org/x/net v0.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: updating go modules should be in a dedicated commit.
import "golang.org/x/exp/constraints" | ||
|
||
// Number defines a type constraint for numbers. | ||
type Number interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
// know the capacity, we fall back to MaxHTLC. | ||
capMsat := lnwire.NewMSatFromSatoshis(edge.capacity) | ||
if capMsat == 0 && edge.policy.MessageFlags.HasMaxHtlc() { | ||
capMsat = edge.policy.MaxHTLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a debug log here to show the fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would only like to do this if we are in assumeChannelValid=false
(normal) mode, otherwise this could become very spammy - every channel would fall back to MaxHTLC
. I could add a Trace
statement here. Alternatively we have to pass in assumeChannelValid
, which would perhaps be too much just for logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think this only happens in neutrino. Using a Trace
sounds good.
@@ -1809,7 +1808,7 @@ func (r *ChannelRouter) FindRoute(source, target route.Vertex, | |||
}), | |||
) | |||
|
|||
return route, nil | |||
return route, probability, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of returning probability here, we can instead add a field to route.Route
to keep the code clean and tight. Plus I think probability
belongs to the struct Route
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about that. Routes can exist without a probability. They are a foundational data structure in lightning, where as probability is just a field that a specific pathfinding implementation adds to it. It also isn't translated to data on the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Routes can exist without a probability.
hmmm what do you mean? Just to be clear I'm not suggesting that we need to save probability to disk, nor do we need to send it over the wire. It can be added as an ephemeral field on Route
so we don't need to return numerous values while some callsites don't use them. If the definition of existence is whether it's defined in specs, I think we have many fundamental structs in channeldb that has non-existent fields.
On the other hand, I think Route always has a probability in lnd
's context, even when users use !use_mc
flag we still have a probability of 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand both points, would be grateful for more input.
58d4deb
to
734eca5
Compare
// know the capacity, we fall back to MaxHTLC. | ||
capMsat := lnwire.NewMSatFromSatoshis(edge.capacity) | ||
if capMsat == 0 && edge.policy.MessageFlags.HasMaxHtlc() { | ||
capMsat = edge.policy.MaxHTLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think this only happens in neutrino. Using a Trace
sounds good.
@@ -10,6 +10,43 @@ import ( | |||
"github.com/lightningnetwork/lnd/routing/route" | |||
) | |||
|
|||
const ( | |||
// capacityCutoffFraction and capacitySmearingFraction define how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -103,7 +103,7 @@ type MissionControl interface { | |||
// GetProbability is expected to return the success probability of a | |||
// payment from fromNode to toNode. | |||
GetProbability(fromNode, toNode route.Vertex, | |||
amt lnwire.MilliSatoshi) float64 | |||
amt lnwire.MilliSatoshi, capacity btcutil.Amount) float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably an edgy case, but what if the channel capacity is actually zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, is it even possible to have a channel with zero capacity, or do you refer to an error in capacity validation? In that case the capacity factor would not be active (it would be 1), so no change compared to the current situation. Either the channel has a maxHTLCMsat
set, then it must be less than or equal to the capacity per spec. In that scenario it would be caught by the amtInRange
check. Otherwise the amtInRange
check currently doesn't prohibit sending to a zero sat channel:
lnd/routing/unified_policies.go
Line 104 in e23c5dc
if u.capacity > 0 && |
In the worst-case scenario we would send to that route and would fail, avoiding the channel next time.
@Roasbeef: review reminder |
aba7ea7
to
51c7bb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM🎉 Still think it's better to put probability inside Route
, but it's a non-blocker. Again great work!
I think this is ready to land after a rebase! |
We encapsulate the capacity inside a unifiedPolicyEdge for later usage. The meaning of "policy" has changed now, which will be refactored in the next commmit.
This commit refactors the semantics of unified policies to unified edges. The main changes are the following renamings: * unifiedPolicies -> nodeEdgeUnifier * unifiedPolicy -> edgeUnifier * unifiedPolicyEdge -> unifiedEdge Comments and shortened variable names are changed to reflect the new semantics.
The test for unified edges is refactored into a table-driven test. It accomodates already a unifier per test for later expansion.
This commit adds experimental support for generic type constraints.
This commit adds the maximal capacity between two nodes to the unified edge data. We use MaxHTLC as a replacement if the channel capacity is not available. In tests we use larger maxHTLC values to be able to convert to a non-zero sat capacity.
Extends the pathfinder with a capacity argument for later usage. In tests, the inserted testCapacity has no effect, but will be used later to estimate reduced probabilities from it.
The returned probability can then be used in QueryRoutes to not having to reconstruct the probability.
FetchPairCapacity is used by the following endpoints to introduce the capacity for probability calculations: * QueryProbability * QueryRoutes
We multiply the apriori probability with a factor to take capacity into account: P *= 1 - 1 / [1 + exp(-(amount - cutoff)/smearing)] The factor is a function value between 1 (small amount) and 0 (high amount). The zero limit may not be reached exactly depending on the smearing and cutoff combination. The function is a logistic function mirrored about the y-axis. The cutoff determines the amount at which a significant reduction in probability takes place and the smearing parameter defines how smooth the transition from 1 to 0 is. Both, the cutoff and smearing parameters are defined in terms of fixed fractions of the capacity.
We deprecate `QueryProbability`, as it displays the same information as `QueryMissionControl` less the probability. `QueryRoutes` still contains the total probability of a route.
51c7bb6
to
2d7fda2
Compare
Changes the docstring of QueryProbability to reflect changes that were introduced in lightningnetwork#6857.
Change Description
Adds a limit for the pathfinding probability, namely that if the amount reaches the capacity of a channel, the success probability should decrease drastically. Fixes #5988 for the current pathfinding system.
To achieve this, we multiply the previous probability with a factor to take capacity into account:
P *= 1 - 0.5 / [1 + exp(-(amount - cutoffFactor*capacity)/(smearingFactor*capacity))]
graph for cap=1Msat, cutoff=0.75*cap, smearing=0.1*cap
This function has the effect that for small amounts we don't alter the current behavior, but for large amounts the probability is reduced. We still consider low-capacity channels and don't throw them away like a hard capacity reduction would do.
This PR introduces building blocks for an alternative description of a probability, see #6815.
Performance considerations:
I investigated the runtime for the capacity fetching and computation. This is interesting if we want to remove the capacity as a parameter of
Estimator.getPairProbability
and insert aroutingGraph
as a dependency instead, but this seems to add quite some latency.[DBG] CRTR: Pathfinding perf metrics: nodes=119, edges=10133, time=108.88252ms
[DBG] CRTR: Pathfinding perf metrics: nodes=250, edges=18004, time=13.57969395s
Questions:
Todo: