Skip to content

Notes on fixing graph sync through forks #522

@cducrest

Description

@cducrest

Project to fix the graph sync

Current way it works

Events from filter

When we start the relay, it will get the list of addresses from the file addresses.json and for each address will start _start_listen_network(address):

def _start_listen_network(self, address):

This will start listener on each events (trustline updates, trasnfer, balance updates, etc ...).

The listeners are greenlets that get new entries on a filter every seconds:

def start_listen_on(

https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/blockchain/proxy.py#L26-37

The filter is a regular web3 filter that gets notified by the blockchain node (parity) when an event for the selected address and type occur.

When events are seen, they trigger changes in the graph and send push notifications to the user:

relay/src/relay/relay.py

Lines 807 to 820 in 3c7ac8e

def _process_balance_update(self, balance_update_event):
graph = self.currency_network_graphs[balance_update_event.network_address]
graph.update_balance(
balance_update_event.from_,
balance_update_event.to,
balance_update_event.value,
balance_update_event.timestamp,
)
self._publish_trustline_events(
user1=balance_update_event.from_,
user2=balance_update_event.to,
network_address=balance_update_event.network_address,
timestamp=balance_update_event.timestamp,
)

State from querying node

The problem is that filters do not handle forks, filters won't be notified in any means by the node when an event is no longer here due to a reorg for example

The way we handle that is by starting a sync process at the same time we start listening on events:

proxy.start_listen_on_full_sync(

This function will start a periodic process (by default every 5 min) that will regenerate the graph by directly querying the state of the blockchain to the node. This does not use events.

def gen_graph_representation(self) -> List[Trustline]:

This should allow us to be "eventually" correct on the graph.

Problems

  1. It can occur that while we are syncing the graph by querying the node, events come to update the graph via filters. The graph regenerated from the state will come to erase the previous graph, thus erasing the update of the event.

  2. When getting events from the filter, there is no guarantee as far as I know that events are ordered in the chronological order blockchain-wise (blocknumber, logindex). Since we collect events every seconds, it could also occur that we get the older event (blockchain-wise) in the earlier second (relay time wise) and the earlier event (blockchain wise) in the later second (relay time wise), producing a wrong result.

  3. We have two sources of truth in the realy: the events from the node, and the ethindex. These might disagree with each other and produce ambiguous behaviours.

  4. Regenerating the graph every 5 min is probably not viable if the graph gets too big.

Potentially Easy Solutions

  • For problem 1) instead of recreating the whole graph and applying it all at once, we could apply it trustlines per trustlines, considerably reducing the odds that an event modify a trustlines while it is being updated. However, during the update process, the graph is a mismatch of different sources of information and might create odds results for example when someone asks for a path.

  • For problem 2), we can order the events we get from the filter. That does not solve the problem that events might not be ordered in between two times where we query the filter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions