Skip to content

pathfinding: capacity-dependent apriori model probability #6857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 13, 2022

Conversation

bitromortac
Copy link
Collaborator

@bitromortac bitromortac commented Aug 24, 2022

Change Description

Adds a limit for the pathfinding probability, namely that if the amount reaches the capacity of a channel, the success probability should decrease drastically. Fixes #5988 for the current pathfinding system.

To achieve this, we multiply the previous probability with a factor to take capacity into account:

P *= 1 - 0.5 / [1 + exp(-(amount - cutoffFactor*capacity)/(smearingFactor*capacity))]
graph for cap=1Msat, cutoff=0.75*cap, smearing=0.1*cap

This function has the effect that for small amounts we don't alter the current behavior, but for large amounts the probability is reduced. We still consider low-capacity channels and don't throw them away like a hard capacity reduction would do.

This PR introduces building blocks for an alternative description of a probability, see #6815.

Performance considerations:

I investigated the runtime for the capacity fetching and computation. This is interesting if we want to remove the capacity as a parameter of Estimator.getPairProbability and insert a routingGraph as a dependency instead, but this seems to add quite some latency.

  • without extra calculation (without last commit):
    [DBG] CRTR: Pathfinding perf metrics: nodes=119, edges=10133, time=108.88252ms
  • with extra calculation (last commit):
    [DBG] CRTR: Pathfinding perf metrics: nodes=250, edges=18004, time=13.57969395s

Questions:

  • The cutoff and scaling parameters need discussion, also whether we want to expose them.
  • Capacity computation should be given another thought.

Todo:

  • fix itests

@bitromortac
Copy link
Collaborator Author

Added a few more unit test cases, fixed a failing itest and made only the apriori hop probability capacity dependent.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively compact pr with a large potential impact. No major comments. Mainly that it might be possible to optimize the commit structure for fewer rewrites. Hold off the addition of the capacity parameter until everything is in place. And I think some commits are more readable when squashed.

@bitromortac bitromortac force-pushed the 2208-apriori-capacity branch 2 times, most recently from b9c3d90 to 0d33c19 Compare September 15, 2022 09:48
@bitromortac
Copy link
Collaborator Author

Relatively compact pr with a large potential impact. No major comments. Mainly that it might be possible to optimize the commit structure for fewer rewrites. Hold off the addition of the capacity parameter until everything is in place. And I think some commits are more readable when squashed.

Thank you for the quick review! I put the capacity calculation into getPolicy. Commits are now reordered to have less rewrites, some were squashed.

Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement with the commit structure 👍

@saubyk saubyk requested review from Roasbeef, yyforyongyu and positiveblue and removed request for Roasbeef October 13, 2022 16:36
@bitromortac bitromortac force-pushed the 2208-apriori-capacity branch from 05e2552 to 60460ab Compare October 17, 2022 14:57
Copy link
Contributor

@positiveblue positiveblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @bitromortac

Thank you very much for taking the time of rewriting the commits in an easy way, this is my first time digging into this part of the code and it was pretty easy to follow 🥇

@@ -223,6 +224,18 @@ func (u *unifiedPolicy) getPolicyNetwork(
continue
}

// Track the maximal capacity for usable channels. If we don't
// know the capacity, we fall back to MaxHTLC.
capMsat := lnwire.NewMSatFromSatoshis(edge.capacity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: when do we have no idea about the capacity of a channel? Unannounced channels from invoices with hop hints?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the case for hop hint channels, yes, as well as for neutrino nodes, which don't have the capacity for channels, as they would have to query each UTXO in the graph

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a neutrino node is running w/ the default "assume chan valid", then they won't have this information.


// We relax the accuracy for the probability check because of the
// capacity cutoff factor.
require.InDelta(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 🎲

routing/graph.go Outdated
func (g *CachedGraph) FetchAmountPairCapacity(nodeFrom, nodeTo route.Vertex,
amount lnwire.MilliSatoshi) (btcutil.Amount, error) {

// For the local node we assume no information on the channel capacity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? And is it the right point to do this check? From a distance it looks like a pair capacity could still be returned for local channels.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I removed it from the function, for this we already have the local probability calculation and it is a layer violation.

routing/graph.go Outdated

// We may not have all policies available to describe the hop between
// the nodes (in the case of hop hints), which is why we return 0 in
// this case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment requires slightly more explanation. Why is it necessary to return 0 in case of hop hints? Maybe the explanation belongs in a different layer, because isn't CachedGraph purely about caching the graph without taking into account higher level usage patterns that might involve policies obtained from other sources?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it is not the right place to have it here. I added a fixup commit with a solution how this could be handled closer to the rpc level, please have a look. I'm not sure if this is the best approach of returning errors here.

unifierFilled.addPolicy(fromNode, &p1, c1)
unifierFilled.addPolicy(fromNode, &p2, c2)

tests := []struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -50,9 +50,10 @@ require (
go.etcd.io/etcd/client/pkg/v3 v3.5.0
go.etcd.io/etcd/client/v3 v3.5.0
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519
golang.org/x/net v0.0.0-20211015210444-4f30a5c0130f
golang.org/x/exp v0.0.0-20221111094246-ab4555d3164f
golang.org/x/net v0.1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: updating go modules should be in a dedicated commit.

import "golang.org/x/exp/constraints"

// Number defines a type constraint for numbers.
type Number interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

// know the capacity, we fall back to MaxHTLC.
capMsat := lnwire.NewMSatFromSatoshis(edge.capacity)
if capMsat == 0 && edge.policy.MessageFlags.HasMaxHtlc() {
capMsat = edge.policy.MaxHTLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a debug log here to show the fallback?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would only like to do this if we are in assumeChannelValid=false (normal) mode, otherwise this could become very spammy - every channel would fall back to MaxHTLC. I could add a Trace statement here. Alternatively we have to pass in assumeChannelValid, which would perhaps be too much just for logging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this only happens in neutrino. Using a Trace sounds good.

@@ -1809,7 +1808,7 @@ func (r *ChannelRouter) FindRoute(source, target route.Vertex,
}),
)

return route, nil
return route, probability, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of returning probability here, we can instead add a field to route.Route to keep the code clean and tight. Plus I think probability belongs to the struct Route anyway.

Copy link
Contributor

@joostjager joostjager Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about that. Routes can exist without a probability. They are a foundational data structure in lightning, where as probability is just a field that a specific pathfinding implementation adds to it. It also isn't translated to data on the wire.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routes can exist without a probability.

hmmm what do you mean? Just to be clear I'm not suggesting that we need to save probability to disk, nor do we need to send it over the wire. It can be added as an ephemeral field on Route so we don't need to return numerous values while some callsites don't use them. If the definition of existence is whether it's defined in specs, I think we have many fundamental structs in channeldb that has non-existent fields.

On the other hand, I think Route always has a probability in lnd's context, even when users use !use_mc flag we still have a probability of 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand both points, would be grateful for more input.

@bitromortac bitromortac force-pushed the 2208-apriori-capacity branch from 58d4deb to 734eca5 Compare November 24, 2022 15:56
// know the capacity, we fall back to MaxHTLC.
capMsat := lnwire.NewMSatFromSatoshis(edge.capacity)
if capMsat == 0 && edge.policy.MessageFlags.HasMaxHtlc() {
capMsat = edge.policy.MaxHTLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think this only happens in neutrino. Using a Trace sounds good.

@@ -10,6 +10,43 @@ import (
"github.com/lightningnetwork/lnd/routing/route"
)

const (
// capacityCutoffFraction and capacitySmearingFraction define how
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -103,7 +103,7 @@ type MissionControl interface {
// GetProbability is expected to return the success probability of a
// payment from fromNode to toNode.
GetProbability(fromNode, toNode route.Vertex,
amt lnwire.MilliSatoshi) float64
amt lnwire.MilliSatoshi, capacity btcutil.Amount) float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably an edgy case, but what if the channel capacity is actually zero?

Copy link
Collaborator Author

@bitromortac bitromortac Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, is it even possible to have a channel with zero capacity, or do you refer to an error in capacity validation? In that case the capacity factor would not be active (it would be 1), so no change compared to the current situation. Either the channel has a maxHTLCMsat set, then it must be less than or equal to the capacity per spec. In that scenario it would be caught by the amtInRange check. Otherwise the amtInRange check currently doesn't prohibit sending to a zero sat channel:

if u.capacity > 0 &&

In the worst-case scenario we would send to that route and would fail, avoiding the channel next time.

@lightninglabs-deploy
Copy link

@Roasbeef: review reminder
@bitromortac, remember to re-request review from reviewers when ready

@bitromortac bitromortac force-pushed the 2208-apriori-capacity branch 2 times, most recently from aba7ea7 to 51c7bb6 Compare December 5, 2022 15:02
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM🎉 Still think it's better to put probability inside Route, but it's a non-blocker. Again great work!

@Roasbeef
Copy link
Member

I think this is ready to land after a rebase!

We encapsulate the capacity inside a unifiedPolicyEdge for later usage.
The meaning of "policy" has changed now, which will be refactored in the
next commmit.
This commit refactors the semantics of unified policies to unified
edges. The main changes are the following renamings:

* unifiedPolicies -> nodeEdgeUnifier
* unifiedPolicy -> edgeUnifier
* unifiedPolicyEdge -> unifiedEdge

Comments and shortened variable names are changed to reflect the new
semantics.
The test for unified edges is refactored into a table-driven test. It
accomodates already a unifier per test for later expansion.
This commit adds experimental support for generic type constraints.
This commit adds the maximal capacity between two nodes to the unified
edge data. We use MaxHTLC as a replacement if the channel capacity is
not available.

In tests we use larger maxHTLC values to be able to convert to a
non-zero sat capacity.
Extends the pathfinder with a capacity argument for later usage.

In tests, the inserted testCapacity has no effect, but will be used
later to estimate reduced probabilities from it.
The returned probability can then be used in QueryRoutes to not having
to reconstruct the probability.
FetchPairCapacity is used by the following endpoints to introduce the
capacity for probability calculations:

* QueryProbability
* QueryRoutes
We multiply the apriori probability with a factor to take capacity into
account:

P *= 1 - 1 / [1 + exp(-(amount - cutoff)/smearing)]

The factor is a function value between 1 (small amount) and 0 (high
amount). The zero limit may not be reached exactly depending on the
smearing and cutoff combination. The function is a logistic function
mirrored about the y-axis. The cutoff determines the amount at which a
significant reduction in probability takes place and the smearing
parameter defines how smooth the transition from 1 to 0 is. Both, the
cutoff and smearing parameters are defined in terms of fixed fractions
of the capacity.
We deprecate `QueryProbability`, as it displays the same information as
`QueryMissionControl` less the probability. `QueryRoutes` still contains
the total probability of a route.
@bitromortac bitromortac force-pushed the 2208-apriori-capacity branch from 51c7bb6 to 2d7fda2 Compare December 12, 2022 13:02
@Roasbeef Roasbeef merged commit d468391 into lightningnetwork:master Dec 13, 2022
bitromortac added a commit to bitromortac/lnd that referenced this pull request Jan 11, 2023
Changes the docstring of QueryProbability to reflect changes that were
introduced in lightningnetwork#6857.
@bitromortac bitromortac deleted the 2208-apriori-capacity branch April 17, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pathfinding: take channel capacity into account
7 participants