-
Notifications
You must be signed in to change notification settings - Fork 276
Use balance estimates from past payments in path-finding #2071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dcff914 to
794ddf6
Compare
02542d7 to
b6d57d9
Compare
Codecov Report
@@ Coverage Diff @@
## master #2071 +/- ##
==========================================
- Coverage 87.58% 87.50% -0.08%
==========================================
Files 165 166 +1
Lines 12788 12875 +87
Branches 563 530 -33
==========================================
+ Hits 11200 11266 +66
- Misses 1588 1609 +21
|
18078b3 to
333c0db
Compare
t-bast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a really good direction to explore! I like the model, I put some comments mostly on what functions it should expose, but the actual logic sounds good to me.
I would scope this PR to only create the balance estimates though, without the path-finding changes. I expect that we will need to iterate on it after seeing how it behaves in practice, before we can start an A/B experiment to actually use it.
eclair-core/src/main/scala/fr/acinq/eclair/payment/send/PaymentLifecycle.scala
Show resolved
Hide resolved
| def baseline(capacity: Satoshi, halfLife: FiniteDuration): BalanceEstimate = noChannels(halfLife).addChannel(capacity) | ||
| } | ||
|
|
||
| case class BalancesEstimates(balances: Map[(PublicKey, PublicKey), BalanceEstimate], defaultHalfLife: FiniteDuration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a (PublicKey, PublicKey) pair and having to do the LexicographicalOrdering check in many places, why not create a case class BalanceEdge(a: PublicKey, b: PublicKey) that enforces this invariant (a < b)?
eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala
Outdated
Show resolved
Hide resolved
| otherSide.couldNotSend(totalCapacity - amount, timestamp).otherSide | ||
|
|
||
| def didSend(amount: MilliSatoshi, timestamp: TimestampSecond): BalanceEstimate = | ||
| copy(low = (low - amount) max MilliSatoshi(0), high = (high - amount) max MilliSatoshi(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you forgot to update the timestamps?
| def didSend(amount: MilliSatoshi, timestamp: TimestampSecond): BalanceEstimate = | ||
| copy(low = (low - amount) max MilliSatoshi(0), high = (high - amount) max MilliSatoshi(0)) | ||
|
|
||
| def addChannel(capacity: Satoshi): BalanceEstimate = copy(high = high + toMilliSatoshi(capacity), totalCapacity = totalCapacity + capacity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should sum the capacity of individual channels, since HTLCs cannot be dynamically split.
I think we should use the max instead.
Otherwise I'm afraid this will not work well for nodes that have a lot of small channels (we'll have a high value that may be N times the actual size of their biggest channel, which isn't actually usable).
It does make removeChannel harder though. Maybe we can simply replace it with a removeEdge function that is called when the last channel between a pair of nodes has been closed?
Or instead of computing the capacity ourselves, we can let external callers that have access to the graph object tell us what capacity to use for a given edge (ie we would expose an updateCapacity function and it's the caller's responsibility to take the max of all existing channels)? If we do that I think we get rid of both addChannel and removeChannel and replace them with addOrUpdateCapacity (which creates the edge if not created before, and sets it capacity) and removeEdge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a very long time I was in favor of combining parallel channels for Uncertainty Estimates into one large virtual channel with the sum of the capacities. However this becomes really tricky if these channels have different fee policies and one may wish to include those as an additional feature to the cost function of the min cost flow problem. I finally arrived at the point where I maintain those parallel channels in my data structure. While I belief it messes up the uncertainty prediction in particular as node operators may chose to forward on one of the other channels I think it is a "them" problem and not a "me" problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However this becomes really tricky if these channels have different fee policies
Do note that it doesn't make any sense IMO to have difference fee policies on different channels with the same peer (because non-strict forwarding)
333c0db to
ed7c7f4
Compare
ed7c7f4 to
a3071bf
Compare
a3071bf to
66fa017
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aborted my review as I cannot make any sense of the semantics of the canSend method which I guess is the main point where you estimate the actual probability of a payment to be successful given your belief about the network.
Overall I can state that a lot of the correct ideas and approaches are in here and some of my comments are very detailed and nuanced because of that. So please don't be offended if I sound over critical but rather take this as a form of respect.
In between the comments I have linked to my reference implementation of handling the probabilities. I strongly encourage you to read through the glossary (Especially terms like cost, uncertaintay cost, linearized integer uncertainty unit cost ,unit cost and features) of that note book and the UncertaintChannel class itself. Maybe this can help resolve the current misunderstandings that i/we seem to have about the semantics of your computation.
Notebale differences of your and my code which should probably be neglected when comparing our approaches:
- My code also accounts in the probablitiy estimates for the inflight HTLCs that we currently put on the network due to an ongoing
PaymentSession. - My code currently does not decay the belief about the network. Instead I reset the learnt information before every new payment session (which is obviously not preferable)
- However as noted below I would decay the
lowandhighbound instead of using a weighed average between prior and conditional probability.
- However as noted below I would decay the
- My code is mainly created for educational purpose and it does not yet adopt the belief in both directions of the channel if we have updates (You encouraged me to do this sooner than later as I guess this may be confusing for others if it is missing)
- Also as you can guess instead of yen's algorithm I use the more accurately modeled min cost flow problem for predicting candidate onions for the MPP problem. (For that reason I also linearize the
routing costwhich is easiest when ignoring non zero base fee channels)
| */ | ||
| private def decay(amount: MilliSatoshi, successProbabilityAtT: Double, t: TimestampSecond): Double = { | ||
| val decayRatio = 1 / math.pow(2, (TimestampSecond.now() - t) / halfLife) | ||
| val baseline = 1 - amount.toLong.toDouble / toMilliSatoshi(totalCapacity).toLong.toDouble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assumes a uniform distribution of the liquidity in channels. Recent research indicates that this in only the case for about 70% of the channels. The other 30% are drained. Thus a probability of: 0.3 * 1/2 + 0.7*(1-a/(c+1)) seems more accurate. of course while operating the node one may collect statistics and constantly estimate the 30/70 split. Alternatively one may estimate the split depending on the channel as the research also reported that there seems to be an active kernel in the network where one may safely assume a uniform distribution.
In particular the success proability is not 1 - a/c but rather 1 - a / (c+1) thogh the difference is in most cases neglectable.
btw: baseline was a very confusing term. Maybe call it priorSuccessProbability as it encodes (P(X > amt) and the other one conditionalSuccessProbability as it should be the successprobability under the conditional information that the random variable is larger than minLiquidity and smaller than maxLiquidity which in math terms may be expressed as (P(X > amt | minLiquidity < X < maxLiquidity)
| private def decay(amount: MilliSatoshi, successProbabilityAtT: Double, t: TimestampSecond): Double = { | ||
| val decayRatio = 1 / math.pow(2, (TimestampSecond.now() - t) / halfLife) | ||
| val baseline = 1 - amount.toLong.toDouble / toMilliSatoshi(totalCapacity).toLong.toDouble | ||
| baseline * (1 - decayRatio) + successProbabilityAtT * decayRatio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised that you do a weighted average here. Why not just decaying minLiquidity towards 0 and maxLiqudity towards capacitywith the givenhalfLife`. I did not check how these two compare to each other but it seems cleaner than using a combination of the prior knowlege and the conditional knowledge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decaying low towards 0 doesn't make sense. At a time t we have a given probability distribution for the balance and decaying going back to the prior (which is our uniform baseline), that's why I'm averaging the distribution at time t and the baseline.
| def couldNotSend(amount: MilliSatoshi, timestamp: TimestampSecond): BalanceEstimate = | ||
| if (amount < high) { | ||
| if (amount > low) { | ||
| copy(high = amount, highTimestamp = timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the high estimate has to be updated intependently of the low estimate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high is updated regardless of the value of low.
| if (amount > low) { | ||
| copy(high = amount, highTimestamp = timestamp) | ||
| } else { | ||
| // the balance is actually below `low`, we discard our lower bound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I agree with the next line that if the amount is below low that one should discard the low estimate
| // the balance is actually below `low`, we discard our lower bound | ||
| copy(low = MilliSatoshi(0), lowTimestamp = timestamp, high = amount, highTimestamp = timestamp) | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you are doing here. if you use the conditional probabilities properly you would never come to the situation where you send something larger than your last estimate.
Sure the decay may change this but then I think the new amount should just be the new high estimate as indicated two comments above.
| val pLow = decay(low, 1, lowTimestamp) | ||
| val pHigh = decay(high, 0, highTimestamp) | ||
|
|
||
| if (amount < low) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the probability of sending the amount should be 1 as you are certain that you can send the amount!
However taking your notation a-x is the amount that can be sent. However I do not understand the semantics of x*pLow and what is acchieved by dividing all by the lower bound of the channel a It certainly does not give you the satoshis that may savely be sent and it most certaintly does not give you an accurate probability.
I have similar issues understanding the next 5 lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
low was the lower bound at some time t. If you call canSend at that time t, then yes the result will be 1 but if call it later it will be lower than 1.
| * @param totalCapacity total capacity of all the channels between the pair of nodes | ||
| * @param halfLife time after which the certainty of the lower/upper bounds is halved | ||
| */ | ||
| case class BalanceEstimate private(low: MilliSatoshi, lowTimestamp: TimestampSecond, high: MilliSatoshi, highTimestamp: TimestampSecond, totalCapacity: Satoshi, halfLife: FiniteDuration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LiquidityEstimate would be a much better name because it is the uncertainty in Liqudity Information that yields a problem. I know in our early work we also called it Balance but this term turned out to be very confusing in many conversations which is why I arrived at the point to consistantly call it Liquidity.
That being said if you look at the reference implementation that I did for the stuff that I think you try to achieve here I use the terms UncertaintyChannel that quantifies my belief about the channels Liqudity and estimates probabilities and costs that can be used in routing decisions (in particular for the min cost flow solver) and UncertaintyNetwork for the entire network that maintains such information:
you can find a very will documented notebook in with a glossary at https://github.com/renepickhardt/mpp-splitter/blob/pickhardt-payments-simulation-dev/Pickhardt-Payments-Simulation.ipynb which is currently up for review at: renepickhardt/mpp-splitter#11
|
|
||
| /* Estimate the probability that we can successfully send `amount` through the channel | ||
| * | ||
| * We estimate this probability with a piecewise linear function: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the gist / goal here but I don't see this being reflected in the code or understand why this is the comment for the function removeChannel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the comment of removeChannel, it's for canSend
| def baseline(capacity: Satoshi, halfLife: FiniteDuration): BalanceEstimate = noChannels(halfLife).addChannel(capacity) | ||
| } | ||
|
|
||
| case class BalancesEstimates(balances: Map[(PublicKey, PublicKey), BalanceEstimate], defaultHalfLife: FiniteDuration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above I call this the UncertaintyNetwork
| * @param includeLocalChannelCost if the path is for relaying and we need to include the cost of the local channel | ||
| */ | ||
| private def addEdgeWeight(sender: PublicKey, edge: GraphEdge, prev: RichWeight, currentBlockHeight: BlockHeight, weightRatios: Either[WeightRatios, HeuristicsConstants], includeLocalChannelCost: Boolean): RichWeight = { | ||
| private def addEdgeWeight(sender: PublicKey, edge: GraphEdge, balance: BalanceEstimate, prev: RichWeight, currentBlockHeight: BlockHeight, weightRatios: Either[WeightRatios, HeuristicsConstants], includeLocalChannelCost: Boolean): RichWeight = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming the canSend function returns the actual conditional probability given your knowledge. The entire thing here is a bit weired to me. IIRC Dijsktra works by minimizing the sum of weights. However an onion is successfull if it is successfull at every hop which means the probabilities along the path have to be multiplied.
So just factoring in your probability weights from your estimates would again break the probabilistic model.
Without deeper understanding I initially used the negative logarithm to transform the multiplication of probabilities to an addition that can be used in dijkstra or for min cost flow problems with seperable cost functions. However while creating my reference implementation and tinkering about feature engineering I realized that for a channel of capacity c we have -log(P(X>=c) = log(c+1) which means that the result is just the entropy of the channel. This is what made me realize that the negative logarithms are not just a neat mathematical trick but rather something that I now call the uncertainty cost. This can btw easily be linearized and transered to a unit cost that is comparable to the fee rate of routing. I strongly suggest to look at the glossary of https://github.com/renepickhardt/mpp-splitter/blob/pickhardt-payments-simulation-dev/Pickhardt-Payments-Simulation.ipynb where I explain this all in more detail. A lot of the choices in feature engineering become very natural an clean when thinking about the situation from the perspective of uncertainty costs (or linearized integer uncertainty unit costs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated to this PR, I've fixed it in #2257
However I'm not convinced the fix is actually useful in practice. By using 1/p we can give a cost to failed attempts which is something concrete, when using -log(p) we fix Dijkstra but the penalty factor does not correspond to anything we care about.
All of the new system is only used when usePastRelaysData = true
d5951b5 to
afa0991
Compare
| lockedFundsRisk = config.getDouble("locked-funds-risk"), | ||
| failureCost = getRelayFees(config.getConfig("failure-cost")), | ||
| hopCost = getRelayFees(config.getConfig("hop-cost")), | ||
| usePastRelaysData = config.getBoolean("use-past-relay-data"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That config value isn't defined in reference.conf
| * @param usePastRelaysData use data from past relays to estimate the balance of the channels | ||
| */ | ||
| case class HeuristicsConstants(lockedFundsRisk: Double, failureCost: RelayFees, hopCost: RelayFees) | ||
| case class HeuristicsConstants(lockedFundsRisk: Double, failureCost: RelayFees, hopCost: RelayFees, usePastRelaysData: Boolean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this called useBalanceEstimates? It would be more coherent, wouldn't it?
|
Since we merged #2272, we can probably close this PR for now? We'll gather mainnet data from our node and will later submit a new PR to use the balance estimation once our model seems to be accurate. |
|
No, this PR also contains the part using the data for the path-finding. There is no reason to throw that away. |
|
That won't be lost, you just have to keep the branch...? All of the comments made on the balance estimation part will be obsolete once you rebase, making it unnecessarily harder for people to read through the PR. It feels better to me to start from scratch and have a new PR focused on only using the balance estimation to keep it focused? |
Currently when a payment fails we ban the channel for some time (60 seconds by default). There are several problems with that:
In this PR, I introduce a probabilistic estimation of the channel balance to replace this binary behavior.