Discovery splitout

## The problem

Current `discovery` command does a few things that are conflated and complicated:

- p2p `static` mode:
  - discv4/v5 on set of static nodes
  - `execution` service blindly dials node records discovered to get status records
  - `consensus` service dials node records matching the upstream beacon node to get status records
- p2p `xatu` mode:
  - `execution` service gets a list of previously discovered status records matching network ids and fork id hashes
    - discv4/v5 on the records
    - blindly dials node records found to get new status records 
  - `consensus` service dials node records matching the upstream beacon node to get status records

As you can see, there are double-ups and confusing logic about what is actually happening.

Ideally, we want to break this down and simplify:

**node discovery**:
  - discv4/v5 on set of static nodes
  - discv4/v5 on previously found execution/consensus records

**execution service**:
  - gets a list of node records to dial
  - only generates events based on optional filters:
    - network id(s)
    - fork id hash(es)

**consensus service**:
  - requires upstream beacon node(s) to target a network/fork digest
  - gets a list of node records to dial
  - generates events if successfully connected and fork digest matches

## Role of the coordinator

The coordinator currently has a few jobs;

- persistence in postgres
- stores all discovery node records (`node_record` table)
- stores dialed nodes status records (`node_record_execution`/`node_record_consensus` tables)
- provides status records to current `discovery` command to do discovery and status checking against previously dialed nodes
- provides records to `mimicry` clients to dial, at the same time storing activity record of which `mimicry` clients are dialing which nodes (`node_record_activity` table)

## Solution

Split the `discovery` command into multiple commands:

- [`discovery`](#discovery)
- [`status`](#status)

### <a name="discovery"></a>`discovery`

This command will do **only** node discovery. It will add records to the `node_record` table. Its job is only to find and add records to the coordinator database. There will be no attempt to connect to the node or verify its status. There will be two jobs for this command:

#### Static discovery

Continuously iterate over a list of node records, enode or enr format, for discovery.

related config;
```yaml
bootNodes:
  - enode:123
  - enr:abc
```

#### Dynamic discovery

It is beneficial to run discovery against known nodes to have greater success in finding similar nodes on the same network.

This job will get records from the `node_record_execution`/`node_record_consensus` tables filtered by the configured execution and beacon node upstream.

- `node_record_execution` records will be filtered by the execution node fork ID hash (see [Fork ID hash](#fork-id-hash))
- `node_record_consensus` records will be filtered by the beacon node fork digest

related config;
```yaml
ethereum:
  beaconNodeAddress: http://localhost:5052
  executionNodeAddress: http://localhost:8545 # must be erigon node for "erigon_forks" method
  # networkOverride: fusaka-devnet-2 # optional
```

#### Full config

```yaml
logging: "info" # panic,fatal,warn,info,debug,trace
metricsAddr: ":9090"
# pprofAddr: ":6060" # optional. if supplied it enables pprof server

coordinator:
  address: localhost:8080
  tls: false
  headers:
    authorization: Someb64Value
  maxQueueSize: 51200
  batchTimeout: 5s
  exportTimeout: 30s
  maxExportBatchSize: 512
  concurrentExecutionPeers: 100

# Note: both Node Discovery Protocol v4 and v5 can be enabled at the same time
# enable Node Discovery Protocol v4
discV4: true
# enable Node Discovery Protocol v5
discV5: true
# time between initiating discovery scans, will generate a fresh private key each time
restart: 2m

bootNodes:
  - enode:123
  - enr:abc

ethereum:
  beaconNodeAddress: http://localhost:5052
  executionNodeAddress: http://localhost:8545 # must be erigon node for "erigon_forks" method
  # networkOverride: fusaka-devnet-2 # optional
```

#### Changes needed from current `discovery`

We need to add an additional column to the `node_record` to store the fork ID hash of the dynamic discovery upstream execution node (if it was used to find the record). This will be used later in the `status` command to better filter records to dial, where fork ID hash matches take priority over none.

### <a name="status"></a>`status`

This command will do **only** status checking against a node. Its job is to dial nodes, both execution and beacon nodes, to get their status.

It will get records from the `node_record` table and dial the nodes to get their status. It will then update the coordinator database (`node_record_execution`/`node_record_consensus`) with the status and also output the status to the configured outputs.

There are some differences between how execution and beacon nodes are handled:

#### Beacon nodes

Fortunately, ENRs contain the fork digest of the node, so we can instantly filter the `node_record` table for beacon nodes to dial.

Once we can dial a node, we confirm the fork digest matches and do the following:
- update the `node_record_consensus` coordinator database table
- output the status event to the configured outputs

#### Execution nodes

Unfortunately, execution nodes require the node to be dialed to get the fork ID hash (and network ID).

In the `discovery` command, we've added an additional column to the `node_record` table to store the fork ID hash of the dynamic discovery upstream execution node (if it was used to find the record). When this command gets a list of records to dial from the coordinator, it will prioritize records that have a fork ID hash match to what is configured.

Once we can dial a node:
- update the `node_record_execution` coordinator database table, **even** if the fork ID hash doesn't match the configured fork ID hash, as this can be used to filter future records to dial.
- if the fork ID hash matched, output the status event to the configured outputs

#### Full config

```yaml

logging: "info" # panic,fatal,warn,info,debug,trace
metricsAddr: ":9090"
# pprofAddr: ":6060" # optional. if supplied it enables pprof server

coordinator:
  address: localhost:8080
  tls: false
  headers:
    authorization: Someb64Value
  maxQueueSize: 51200
  batchTimeout: 5s
  exportTimeout: 30s
  maxExportBatchSize: 512
  concurrentExecutionPeers: 100

ethereum:
  beaconNodeAddress: http://localhost:5052
  executionNodeAddress: http://localhost:8545 # must be erigon node for "erigon_forks" method
  # networkOverride: fusaka-devnet-2 # optional

outputs:
# - name: local-stdout
#   type: stdout
- name: xatu-server
  type: xatu
  config:
    address: localhost:8080
    tls: false
    headers:
      authorization: Someb64Value
    maxQueueSize: 51200
    batchTimeout: 5s
    exportTimeout: 30s
```

### <a name="fork-id-hash"></a>Fork ID hash

To calculate the correct fork ID hash, you need all the previous fork hashes and the genesis hash. Erigon provides the `erigon_forks` method to get this information. Here is a Go example to calculate the fork ID hash:

<details close>
<summary>Golang example</summary>
<br>

```golang
package main

import (
	"bytes"
	"encoding/binary"
	"encoding/hex"
	"encoding/json"
	"flag"
	"fmt"
	"hash/crc32"
	"io"
	"net/http"
	"strings"
)

type JSONRPCRequest struct {
	JSONRPC string        `json:"jsonrpc"`
	Method  string        `json:"method"`
	Params  []interface{} `json:"params"`
	ID      int           `json:"id"`
}

type ForksResult struct {
	Genesis     string `json:"genesis"`
	HeightForks []int  `json:"heightForks"`
	TimeForks   []int  `json:"timeForks"`
}

type JSONRPCResponse struct {
	JSONRPC string      `json:"jsonrpc"`
	ID      int         `json:"id"`
	Result  ForksResult `json:"result"`
	Error   *struct {
		Code    int    `json:"code"`
		Message string `json:"message"`
	} `json:"error"`
}

func checksumUpdate(hash uint32, fork uint64) uint32 {
	var blob [8]byte
	binary.BigEndian.PutUint64(blob[:], fork)
	return crc32.Update(hash, crc32.IEEETable, blob[:])
}

func main() {
	elURL := flag.String("el-url", "http://localhost:8545", "Execution layer URL")
	flag.Parse()

	request := JSONRPCRequest{
		JSONRPC: "2.0",
		Method:  "erigon_forks",
		Params:  []interface{}{},
		ID:      1,
	}

	jsonData, err := json.Marshal(request)
	if err != nil {
		fmt.Printf("Error marshaling request: %v\n", err)
		return
	}

	resp, err := http.Post(*elURL, "application/json", bytes.NewBuffer(jsonData))
	if err != nil {
		fmt.Printf("Error making request: %v\n", err)
		return
	}
	defer resp.Body.Close()

	body, err := io.ReadAll(resp.Body)
	if err != nil {
		fmt.Printf("Error reading response: %v\n", err)
		return
	}

	var rpcResponse JSONRPCResponse
	err = json.Unmarshal(body, &rpcResponse)
	if err != nil {
		fmt.Printf("Error unmarshaling response: %v\n", err)
		return
	}

	if rpcResponse.Error != nil {
		fmt.Printf("RPC Error: %s (code: %d)\n", rpcResponse.Error.Message, rpcResponse.Error.Code)
		return
	}

	// Calculate CRC32 hash of genesis
	genesisHex := strings.TrimPrefix(rpcResponse.Result.Genesis, "0x")
	genesisBytes, err := hex.DecodeString(genesisHex)
	if err != nil {
		fmt.Printf("Error decoding genesis hex: %v\n", err)
		return
	}

	// Start with genesis hash
	hash := crc32.ChecksumIEEE(genesisBytes)

	// Iterate through all heightForks
	for _, fork := range rpcResponse.Result.HeightForks {
		hash = checksumUpdate(hash, uint64(fork))
	}

	// Iterate through all timeForks
	for _, fork := range rpcResponse.Result.TimeForks {
		hash = checksumUpdate(hash, uint64(fork))
	}

	// Output final hash
	fmt.Printf("0x%x\n", hash)
}
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery splitout #595

The problem

Role of the coordinator

Solution

`discovery`

Static discovery

Dynamic discovery

Full config

Changes needed from current `discovery`

`status`

Beacon nodes

Execution nodes

Full config

Fork ID hash

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discovery splitout #595

Description

The problem

Role of the coordinator

Solution

discovery

Static discovery

Dynamic discovery

Full config

Changes needed from current discovery

status

Beacon nodes

Execution nodes

Full config

Fork ID hash

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`discovery`

Changes needed from current `discovery`

`status`