Use balance estimates from past payments in path-finding #2071

thomash-acinq · 2021-11-19T17:32:55Z

Currently when a payment fails we ban the channel for some time (60 seconds by default). There are several problems with that:

If we try to relay a huge payment and it fails, we may still be able to relay a smaller payment
If the ban duration is too short, it still won't be able to relay payments when we unban it. Many channels are unbalanced and inactive and if they are considered attractive by our path-finding algorithm (because they're big and cheap), they'll keep coming back and make our payment attempts fail.
In this PR, I introduce a probabilistic estimation of the channel balance to replace this binary behavior.

codecov-commenter · 2021-12-27T10:48:21Z

Codecov Report

Merging #2071 (333c0db) into master (0b807d2) will decrease coverage by 0.07%.
The diff coverage is 81.98%.

@@            Coverage Diff             @@
##           master    #2071      +/-   ##
==========================================
- Coverage   87.58%   87.50%   -0.08%     
==========================================
  Files         165      166       +1     
  Lines       12788    12875      +87     
  Branches      563      530      -33     
==========================================
+ Hits        11200    11266      +66     
- Misses       1588     1609      +21

Impacted Files	Coverage Δ
...r/acinq/eclair/payment/send/PaymentLifecycle.scala	`86.39% <56.25%> (-0.49%)`	⬇️
...src/main/scala/fr/acinq/eclair/router/Router.scala	`93.29% <77.77%> (-1.65%)`	⬇️
...scala/fr/acinq/eclair/router/BalanceEstimate.scala	`83.92% <83.92%> (ø)`
.../src/main/scala/fr/acinq/eclair/router/Graph.scala	`97.19% <94.87%> (+0.03%)`	⬆️
...re/src/main/scala/fr/acinq/eclair/NodeParams.scala	`91.84% <100.00%> (+0.03%)`	⬆️
...cinq/eclair/remote/EclairInternalsSerializer.scala	`97.87% <100.00%> (ø)`
...cala/fr/acinq/eclair/router/RouteCalculation.scala	`95.10% <100.00%> (ø)`
...main/scala/fr/acinq/eclair/router/Validation.scala	`90.87% <100.00%> (-1.44%)`	⬇️
...main/scala/fr/acinq/eclair/io/PeerConnection.scala	`85.44% <0.00%> (-0.39%)`	⬇️
... and 1 more

t-bast

I think it's a really good direction to explore! I like the model, I put some comments mostly on what functions it should expose, but the actual logic sounds good to me.

I would scope this PR to only create the balance estimates though, without the path-finding changes. I expect that we will need to iterate on it after seeing how it behaves in practice, before we can start an A/B experiment to actually use it.

eclair-core/src/main/resources/reference.conf

eclair-core/src/main/scala/fr/acinq/eclair/payment/send/PaymentLifecycle.scala

eclair-core/src/main/scala/fr/acinq/eclair/router/Graph.scala

eclair-core/src/main/scala/fr/acinq/eclair/router/Router.scala

t-bast · 2022-01-04T12:42:55Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+  def baseline(capacity: Satoshi, halfLife: FiniteDuration): BalanceEstimate = noChannels(halfLife).addChannel(capacity)
+}
+
+case class BalancesEstimates(balances: Map[(PublicKey, PublicKey), BalanceEstimate], defaultHalfLife: FiniteDuration) {


Instead of using a (PublicKey, PublicKey) pair and having to do the LexicographicalOrdering check in many places, why not create a case class BalanceEdge(a: PublicKey, b: PublicKey) that enforces this invariant (a < b)?

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

t-bast · 2022-01-04T12:52:57Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+    otherSide.couldNotSend(totalCapacity - amount, timestamp).otherSide
+
+  def didSend(amount: MilliSatoshi, timestamp: TimestampSecond): BalanceEstimate =
+    copy(low = (low - amount) max MilliSatoshi(0), high = (high - amount) max MilliSatoshi(0))


It looks like you forgot to update the timestamps?

t-bast · 2022-01-04T12:55:28Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+  def didSend(amount: MilliSatoshi, timestamp: TimestampSecond): BalanceEstimate =
+    copy(low = (low - amount) max MilliSatoshi(0), high = (high - amount) max MilliSatoshi(0))
+
+  def addChannel(capacity: Satoshi): BalanceEstimate = copy(high = high + toMilliSatoshi(capacity), totalCapacity = totalCapacity + capacity)


I don't think we should sum the capacity of individual channels, since HTLCs cannot be dynamically split.
I think we should use the max instead.
Otherwise I'm afraid this will not work well for nodes that have a lot of small channels (we'll have a high value that may be N times the actual size of their biggest channel, which isn't actually usable).

It does make removeChannel harder though. Maybe we can simply replace it with a removeEdge function that is called when the last channel between a pair of nodes has been closed?

Or instead of computing the capacity ourselves, we can let external callers that have access to the graph object tell us what capacity to use for a given edge (ie we would expose an updateCapacity function and it's the caller's responsibility to take the max of all existing channels)? If we do that I think we get rid of both addChannel and removeChannel and replace them with addOrUpdateCapacity (which creates the edge if not created before, and sets it capacity) and removeEdge.

For a very long time I was in favor of combining parallel channels for Uncertainty Estimates into one large virtual channel with the sum of the capacities. However this becomes really tricky if these channels have different fee policies and one may wish to include those as an additional feature to the cost function of the min cost flow problem. I finally arrived at the point where I maintain those parallel channels in my data structure. While I belief it messes up the uncertainty prediction in particular as node operators may chose to forward on one of the other channels I think it is a "them" problem and not a "me" problem

However this becomes really tricky if these channels have different fee policies

Do note that it doesn't make any sense IMO to have difference fee policies on different channels with the same peer (because non-strict forwarding)

renepickhardt

I aborted my review as I cannot make any sense of the semantics of the canSend method which I guess is the main point where you estimate the actual probability of a payment to be successful given your belief about the network.

Overall I can state that a lot of the correct ideas and approaches are in here and some of my comments are very detailed and nuanced because of that. So please don't be offended if I sound over critical but rather take this as a form of respect.

In between the comments I have linked to my reference implementation of handling the probabilities. I strongly encourage you to read through the glossary (Especially terms like cost, uncertaintay cost, linearized integer uncertainty unit cost ,unit cost and features) of that note book and the UncertaintChannel class itself. Maybe this can help resolve the current misunderstandings that i/we seem to have about the semantics of your computation.

Notebale differences of your and my code which should probably be neglected when comparing our approaches:

My code also accounts in the probablitiy estimates for the inflight HTLCs that we currently put on the network due to an ongoing PaymentSession.
My code currently does not decay the belief about the network. Instead I reset the learnt information before every new payment session (which is obviously not preferable)
- However as noted below I would decay the low and high bound instead of using a weighed average between prior and conditional probability.
My code is mainly created for educational purpose and it does not yet adopt the belief in both directions of the channel if we have updates (You encouraged me to do this sooner than later as I guess this may be confusing for others if it is missing)
Also as you can guess instead of yen's algorithm I use the more accurately modeled min cost flow problem for predicting candidate onions for the MPP problem. (For that reason I also linearize the routing cost which is easiest when ignoring non zero base fee channels)

renepickhardt · 2022-04-14T07:39:45Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+   */
+  private def decay(amount: MilliSatoshi, successProbabilityAtT: Double, t: TimestampSecond): Double = {
+    val decayRatio = 1 / math.pow(2, (TimestampSecond.now() - t) / halfLife)
+    val baseline = 1 - amount.toLong.toDouble / toMilliSatoshi(totalCapacity).toLong.toDouble


this assumes a uniform distribution of the liquidity in channels. Recent research indicates that this in only the case for about 70% of the channels. The other 30% are drained. Thus a probability of: 0.3 * 1/2 + 0.7*(1-a/(c+1)) seems more accurate. of course while operating the node one may collect statistics and constantly estimate the 30/70 split. Alternatively one may estimate the split depending on the channel as the research also reported that there seems to be an active kernel in the network where one may safely assume a uniform distribution.

In particular the success proability is not 1 - a/c but rather 1 - a / (c+1) thogh the difference is in most cases neglectable.

btw: baseline was a very confusing term. Maybe call it priorSuccessProbability as it encodes (P(X > amt) and the other one conditionalSuccessProbability as it should be the successprobability under the conditional information that the random variable is larger than minLiquidity and smaller than maxLiquidity which in math terms may be expressed as (P(X > amt | minLiquidity < X < maxLiquidity)

renepickhardt · 2022-04-14T07:42:45Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+  private def decay(amount: MilliSatoshi, successProbabilityAtT: Double, t: TimestampSecond): Double = {
+    val decayRatio = 1 / math.pow(2, (TimestampSecond.now() - t) / halfLife)
+    val baseline = 1 - amount.toLong.toDouble / toMilliSatoshi(totalCapacity).toLong.toDouble
+    baseline * (1 - decayRatio) + successProbabilityAtT * decayRatio


I am a bit surprised that you do a weighted average here. Why not just decaying minLiquidity towards 0 and maxLiqudity towards capacitywith the givenhalfLife`. I did not check how these two compare to each other but it seems cleaner than using a combination of the prior knowlege and the conditional knowledge

Decaying low towards 0 doesn't make sense. At a time t we have a given probability distribution for the balance and decaying going back to the prior (which is our uniform baseline), that's why I'm averaging the distribution at time t and the baseline.

renepickhardt · 2022-04-14T07:45:54Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+  def couldNotSend(amount: MilliSatoshi, timestamp: TimestampSecond): BalanceEstimate =
+    if (amount < high) {
+      if (amount > low) {
+        copy(high = amount, highTimestamp = timestamp)


I think the high estimate has to be updated intependently of the low estimate.

high is updated regardless of the value of low.

renepickhardt · 2022-04-14T07:48:44Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+      if (amount > low) {
+        copy(high = amount, highTimestamp = timestamp)
+      } else {
+        // the balance is actually below `low`, we discard our lower bound


But I agree with the next line that if the amount is below low that one should discard the low estimate

renepickhardt · 2022-04-14T07:52:47Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+        // the balance is actually below `low`, we discard our lower bound
+        copy(low = MilliSatoshi(0), lowTimestamp = timestamp, high = amount, highTimestamp = timestamp)
+      }
+    } else {


I don't understand what you are doing here. if you use the conditional probabilities properly you would never come to the situation where you send something larger than your last estimate.

Sure the decay may change this but then I think the new amount should just be the new high estimate as indicated two comments above.

renepickhardt · 2022-04-14T08:14:36Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+    val pLow = decay(low, 1, lowTimestamp)
+    val pHigh = decay(high, 0, highTimestamp)
+
+    if (amount < low) {


In this case the probability of sending the amount should be 1 as you are certain that you can send the amount!

However taking your notation a-x is the amount that can be sent. However I do not understand the semantics of x*pLow and what is acchieved by dividing all by the lower bound of the channel a It certainly does not give you the satoshis that may savely be sent and it most certaintly does not give you an accurate probability.

I have similar issues understanding the next 5 lines.

No.
low was the lower bound at some time t. If you call canSend at that time t, then yes the result will be 1 but if call it later it will be lower than 1.

renepickhardt · 2022-04-14T08:21:23Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+ * @param totalCapacity total capacity of all the channels between the pair of nodes
+ * @param halfLife      time after which the certainty of the lower/upper bounds is halved
+ */
+case class BalanceEstimate private(low: MilliSatoshi, lowTimestamp: TimestampSecond, high: MilliSatoshi, highTimestamp: TimestampSecond, totalCapacity: Satoshi, halfLife: FiniteDuration) {


I think LiquidityEstimate would be a much better name because it is the uncertainty in Liqudity Information that yields a problem. I know in our early work we also called it Balance but this term turned out to be very confusing in many conversations which is why I arrived at the point to consistantly call it Liquidity.
That being said if you look at the reference implementation that I did for the stuff that I think you try to achieve here I use the terms UncertaintyChannel that quantifies my belief about the channels Liqudity and estimates probabilities and costs that can be used in routing decisions (in particular for the min cost flow solver) and UncertaintyNetwork for the entire network that maintains such information:

you can find a very will documented notebook in with a glossary at https://github.com/renepickhardt/mpp-splitter/blob/pickhardt-payments-simulation-dev/Pickhardt-Payments-Simulation.ipynb which is currently up for review at: renepickhardt/mpp-splitter#11

renepickhardt · 2022-04-14T08:23:12Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+
+  /* Estimate the probability that we can successfully send `amount` through the channel
+   *
+   * We estimate this probability with a piecewise linear function:


I agree with the gist / goal here but I don't see this being reflected in the code or understand why this is the comment for the function removeChannel

This is not the comment of removeChannel, it's for canSend

renepickhardt · 2022-04-14T08:24:35Z

eclair-core/src/main/scala/fr/acinq/eclair/router/BalanceEstimate.scala

+  def baseline(capacity: Satoshi, halfLife: FiniteDuration): BalanceEstimate = noChannels(halfLife).addChannel(capacity)
+}
+
+case class BalancesEstimates(balances: Map[(PublicKey, PublicKey), BalanceEstimate], defaultHalfLife: FiniteDuration) {


as mentioned above I call this the UncertaintyNetwork

renepickhardt · 2022-04-14T08:39:43Z

eclair-core/src/main/scala/fr/acinq/eclair/router/Graph.scala

   * @param includeLocalChannelCost if the path is for relaying and we need to include the cost of the local channel
   */
-  private def addEdgeWeight(sender: PublicKey, edge: GraphEdge, prev: RichWeight, currentBlockHeight: BlockHeight, weightRatios: Either[WeightRatios, HeuristicsConstants], includeLocalChannelCost: Boolean): RichWeight = {
+  private def addEdgeWeight(sender: PublicKey, edge: GraphEdge, balance: BalanceEstimate, prev: RichWeight, currentBlockHeight: BlockHeight, weightRatios: Either[WeightRatios, HeuristicsConstants], includeLocalChannelCost: Boolean): RichWeight = {


assuming the canSend function returns the actual conditional probability given your knowledge. The entire thing here is a bit weired to me. IIRC Dijsktra works by minimizing the sum of weights. However an onion is successfull if it is successfull at every hop which means the probabilities along the path have to be multiplied.
So just factoring in your probability weights from your estimates would again break the probabilistic model.

Without deeper understanding I initially used the negative logarithm to transform the multiplication of probabilities to an addition that can be used in dijkstra or for min cost flow problems with seperable cost functions. However while creating my reference implementation and tinkering about feature engineering I realized that for a channel of capacity c we have -log(P(X>=c) = log(c+1) which means that the result is just the entropy of the channel. This is what made me realize that the negative logarithms are not just a neat mathematical trick but rather something that I now call the uncertainty cost. This can btw easily be linearized and transered to a unit cost that is comparable to the fee rate of routing. I strongly suggest to look at the glossary of https://github.com/renepickhardt/mpp-splitter/blob/pickhardt-payments-simulation-dev/Pickhardt-Payments-Simulation.ipynb where I explain this all in more detail. A lot of the choices in feature engineering become very natural an clean when thinking about the situation from the perspective of uncertainty costs (or linearized integer uncertainty unit costs.

This is unrelated to this PR, I've fixed it in #2257
However I'm not convinced the fix is actually useful in practice. By using 1/p we can give a cost to failed attempts which is something concrete, when using -log(p) we fix Dijkstra but the penalty factor does not correspond to anything we care about.

All of the new system is only used when usePastRelaysData = true

t-bast · 2022-05-10T08:39:35Z

eclair-core/src/main/scala/fr/acinq/eclair/NodeParams.scala

          lockedFundsRisk = config.getDouble("locked-funds-risk"),
          failureCost = getRelayFees(config.getConfig("failure-cost")),
          hopCost = getRelayFees(config.getConfig("hop-cost")),
+          usePastRelaysData = config.getBoolean("use-past-relay-data"),


That config value isn't defined in reference.conf

t-bast · 2022-05-10T11:13:58Z

eclair-core/src/main/scala/fr/acinq/eclair/router/Graph.scala

+   * @param usePastRelaysData use data from past relays to estimate the balance of the channels
   */
-  case class HeuristicsConstants(lockedFundsRisk: Double, failureCost: RelayFees, hopCost: RelayFees)
+  case class HeuristicsConstants(lockedFundsRisk: Double, failureCost: RelayFees, hopCost: RelayFees, usePastRelaysData: Boolean)


Why isn't this called useBalanceEstimates? It would be more coherent, wouldn't it?

t-bast · 2022-06-08T13:23:30Z

Since we merged #2272, we can probably close this PR for now? We'll gather mainnet data from our node and will later submit a new PR to use the balance estimation once our model seems to be accurate.

thomash-acinq · 2022-06-08T13:25:59Z

No, this PR also contains the part using the data for the path-finding. There is no reason to throw that away.

t-bast · 2022-06-08T13:32:49Z

That won't be lost, you just have to keep the branch...?

All of the comments made on the balance estimation part will be obsolete once you rebase, making it unnecessarily harder for people to read through the PR. It feels better to me to start from scratch and have a new PR focused on only using the balance estimation to keep it focused?

thomash-acinq force-pushed the balance-estimate branch 3 times, most recently from dcff914 to 794ddf6 Compare November 24, 2021 15:28

thomash-acinq force-pushed the balance-estimate branch 2 times, most recently from 02542d7 to b6d57d9 Compare December 27, 2021 10:32

thomash-acinq force-pushed the balance-estimate branch 7 times, most recently from 18078b3 to 333c0db Compare December 28, 2021 14:53

thomash-acinq marked this pull request as ready for review December 28, 2021 15:08

t-bast reviewed Jan 4, 2022

View reviewed changes

thomash-acinq force-pushed the balance-estimate branch from 333c0db to ed7c7f4 Compare February 7, 2022 16:50

thomash-acinq force-pushed the balance-estimate branch from ed7c7f4 to a3071bf Compare March 10, 2022 16:05

thomash-acinq force-pushed the balance-estimate branch from a3071bf to 66fa017 Compare March 31, 2022 14:55

t-bast added the Optech Make Me Famous! label Apr 7, 2022

renepickhardt reviewed Apr 14, 2022

View reviewed changes

thomash-acinq added 3 commits May 9, 2022 12:54

Use balance estimates from past payments in path-finding

530137d

Address comments

b8ad66d

Add usePastRelaysData

afa0991

All of the new system is only used when usePastRelaysData = true

thomash-acinq force-pushed the balance-estimate branch from d5951b5 to afa0991 Compare May 9, 2022 11:02

t-bast reviewed May 10, 2022

View reviewed changes

thomash-acinq marked this pull request as draft May 10, 2022 11:44

More documentation

43c9fa2

thomash-acinq mentioned this pull request May 10, 2022

Estimate balances of remote edges #2263

Closed

thomash-acinq closed this Jun 8, 2022

Use balance estimates from past payments in path-finding #2071

Use balance estimates from past payments in path-finding #2071

Uh oh!

Conversation

thomash-acinq commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Dec 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

t-bast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

renepickhardt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-bast commented Jun 8, 2022

Uh oh!

thomash-acinq commented Jun 8, 2022

Uh oh!

t-bast commented Jun 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

thomash-acinq commented Nov 19, 2021 •

edited

Loading

codecov-commenter commented Dec 27, 2021 •

edited

Loading

renepickhardt left a comment •

edited

Loading