Query Search Path by mrferris · Pull Request #482 · ethereum/trin

mrferris · 2022-11-24T04:02:07Z

What was wrong?

portal_historyTraceRecursiveFindContent endpoint's output contains a list of nodes involved in a content query, but currently lacks a comprehensive trace of the query's search path.

How was it fixed?

Each content query trace is kept track of by a QueryTracer struct, that ultimately serializes to the following JSON format:

    {
         "found_content_at": "G",
         "origin": "A",
         "responses":
               "A": {
                       timestamp_ms: 0,
                       responded_with: ["B", "C", "D"],
               },
               "B": {
                       timestamp_ms: 124,
                       responded_with: ["E", "F", "G"],
               },
              "C": {
                      timestamp_ms: 150,
                      responded_with: [],
               },
              "G": {
                       timestamp_ms: 200,
                       responded_with: [],
               },
     }

where each letter is an ENR.
Each entry in responses is a map of a remote node's ENR to the list of ENRs that it responded with because it didn't have the content. origin is the local node, and the responses entry for the origin node shows the nodes that were closest to the content in its own routing table. The entry for the found_at node should have an empty responded_with field, as it is only used to mark the timestamp at which the content was received.

Each node's timestamp_ms field contains the number of milliseconds that elapsed between the query beginning and the response from the node being received.

Note that only the first node that responded with a given node A will have A in their responded_with list, so only the actual route taken is present in the data rather than other hypothetical routes. This means that many nodes will have empty responded_with arrays. These nodes did respond, but not with anything that hasn't already been seen in the query. This is the case for C in the example above.

A full visualization of a query using this data format is done here: https://github.com/pipermerriam/glados/pull/28

Here's a screenshot:

and here's the route highlighted on a successful query:

To-Do

Add timestamps
Use ENRs instead of Node IDs
Include data to distinguish between no response from a node and no progress toward content from a node
Unit tests
Add entry to the release notes
Clean up commit history

jacobkaufmann

Neither approving or requesting changes because I want to get your thoughts on the comments.

I'm also not sure how I feel about changing the query callback response type for FindContent. An alternative would be to create a struct for that type and include a field for the trace, which could be wrapped in an Option. I don't have a problem with doing the trace for all queries, because it could be useful for metrics, but I'm not sure it should be included in the response for non-trace find content queries.

I also find the structure of the response confusing, particularly the omission of previously returned ENRs from the arrays. If each response is timestamped, then you could infer the path if you assume that you only query a node the first time you see it. Alternatively, you could include a field in the response dedicated to representing the path.

trin-core/src/portalnet/find/query_tracer.rs

trin-history/src/jsonrpc.rs

trin-core/src/portalnet/find/query_tracer.rs

perama-v

Very nice. Looks good to me. Tested by making a query (portal_historyTraceRecursiveFindContent) and inspecting the results.

I notice that the resulting "responses" includes the nodes own ID and includes "responded_with": ["node_x", "node_y", ...] for itself. This seems ok/useful, just wanted to note it.

ethportal-api/src/types/portal.rs

ethportal-peertest/src/scenarios/find.rs

trin-core/src/portalnet/find/query_trace.rs

trin-history/src/jsonrpc.rs

ogenev

Blocking this until we agree on what to do with the Value type in TraceContentInfo.

mrferris · 2023-02-24T21:16:22Z

I'm also not sure how I feel about changing the query callback response type for FindContent. An alternative would be to create a struct for that type and include a field for the trace, which could be wrapped in an Option.

Agreed 👍

I also find the structure of the response confusing, particularly the omission of previously returned ENRs from the arrays. If each response is timestamped, then you could infer the path if you assume that you only query a node the first time you see it.

I originally had it that way but didn't see any useful, non-chaotic way of visualizing that info on the front-end so it was unnecessary data & front-end parsing complexity. I'm open to a second iteration down the line that adds all response data.

mrferris · 2023-02-24T21:21:58Z

Note that node_meta_data was added to the format, but will no longer be necessary once full ENR parsing is working on the front-end.

Also note that Node IDs are now being used as the primary ID for each node rather than ENRs. This was due to a bug caused by multiple ENRs for the same node ID making it into the QueryTrace.

jacobkaufmann

It looks like there are some unaddressed comments from other reviewers, and I have some comments of my own.

At a high level, the following aspects of the trace structure do not make much sense to me:

that the origin node has an entry in the responses field
that only the first node to respond with some node X will be the only entry with X in that list

I believe we can find a better representation for the trace that doesn't warp the information in these ways. I would be more in favor of a straightforward representation that requires the caller to do whatever necessary disentanglement of the data on their end.

trin-core/src/portalnet/find/query_trace.rs

trin-core/src/portalnet/find/query_info.rs

trin-history/src/jsonrpc.rs

mrferris · 2023-03-06T17:41:18Z

@jacobkaufmann

that the origin node has an entry in the responses field

Why would origin not have an entry? Where do we put the nodes that origin used to begin the query?

that only the first node to respond with some node X will be the only entry with X in that list

My previous response from above:

I originally had it that way (all responses) but didn't see any useful, non-chaotic way of visualizing that info on the front-end so it was unnecessary data & front-end parsing complexity. I'm open to a second iteration down the line that adds all response data.

Why add unused complexity to both the back-end and front-end? We can iterate this format as our needs progress.

jacobkaufmann · 2023-03-06T19:18:02Z

apologies for missing some of the original comments on rationale.

the origin would not have entry in responses because there is no notion of "response" for the originating node. that information (the bootstrap query peers) could exist in a separate field. it seems like we are trying to stuff information into this shape that does not really accommodate the info in a clean way, so we can explore a different structure.

my claim is that right now we are introducing complexity on the back-end where things could be much simpler, and it would be better for any complexity to reside on the front-end. the simplest thing to do on the back-end is to provide all the response data as we received it. as long as the front-end can tease out the structure from that data, then it is preferable to keep our logic simple at the expense of the consumer.

mrferris · 2023-03-10T20:33:44Z

@ogenev

Blocking this until we agree on what to do with the Value type in TraceContentInfo.

I've created QueryTrace, NodeInfo, and QueryResponse types to use in TraceContentInfo in ethportal-api, but kept the QueryTrace implementation that creates the QueryTrace in trin-core. Perhaps the one in trin-core should be named QueryTracer (with an r) and the one in ethportal-api named QueryTrace to denote that one creates the other. Or perhaps there should just be one type and it should live in ethportal-api.

I kept the QueryTrace implementation in trin-core because it didn't feel right to move impl'd functionality (and the imports that come with it) to ethportal-api when it wouldn't be used outside of Trin. It also allowed for the separation between internal trace representation and RPC trace representation that @jacobkaufmann was looking to see. Let me know if you disagree with that reasoning, or any feedback on the usage of ethportal-api types in the tests.

mrferris · 2023-03-10T21:46:50Z

@jacobkaufmann

there is no notion of "response" for the originating node

The format is generalized: each node (including the origin node) responds to the query by either returning the requested content or by looking in its own routing table for closest ENRs to continue the query. I see no need to make the origin node's behavior an edge case and add complexity to the format instead of just describing its behavior the exact same way that the behavior of every other node is described. Seems cleaner than any non-generalized format to me.

This is the same reason why I felt it was best to change the output of a trace where the content is found locally from {} to {origin: origin_id, found_content_at: origin_id}. The format describes what happened in a generalized way, rather than piecewise.

I suppose the semantics of the word responses could use improvement. Feel free to propose alternatives, maybe closest_nodes_unseen.

my claim is that right now we are introducing complexity on the back-end where things could be much simpler, and it would be better for any complexity to reside on the front-end.

Could you elaborate on where complexity is being introduced to trin?

Are you referencing nodes only being included in the trace the first time they're seen? The single if statement that handles it is already there for other purposes, and when we make small modification to trin to add every response it will still be the same format, just with more items in each responses entry. I'm not seeing how there's any warping or complexity there.

I agree with adding all responses in the near future. It will allow us to answer/visualize interesting questions like the degree of routing table overlap across the network. Not sure if you're saying we should block this and come up with a new format before deploying the existing functionality.

carver · 2023-03-14T00:24:53Z

@jacobkaufmann

there is no notion of "response" for the originating node

The format is generalized: each node (including the origin node) responds to the query by either returning the requested content or by looking in its own routing table for closest ENRs to continue the query. I see no need to make the origin node's behavior an edge case and add complexity to the format instead of just describing its behavior the exact same way that the behavior of every other node is described. Seems cleaner than any non-generalized format to me.

This is the same reason why I felt it was best to change the output of a trace where the content is found locally from {} to {origin: origin_id, found_content_at: origin_id}. The format describes what happened in a generalized way, rather than piecewise.

I suppose the semantics of the word responses could use improvement. Feel free to propose alternatives, maybe closest_nodes_unseen.

Yeah, naming this as a DAGSearchPath instead of a query trace might help? This data structure is the nodes and directed edges that were followed in the path. That explains why we drop the duplicate responses, and why the origin node lists outbound edges just like all the intermediate nodes do.

The reason to favor something like Jacob is talking about would be if we want to use the query trace for something besides visualizing a search path. In which case, it makes sense to keep the data structure as clearly describing the series of events that occurred (nodes responded, etc), rather than overfitting to this particular usage.

I don't have a clear idea what else we would use this for right now. So maybe just "admitting" that we're building this data for a single purpose, by naming it as such, is the way to go. Until we have a specific idea of what else we would use a query trace for.

mrferris · 2023-03-16T15:24:36Z

Yeah, naming this as a DAGSearchPath instead of a query trace might help?

Sure.

The reason to favor something like Jacob is talking about would be if we want to use the query trace for something besides visualizing a search path. In which case, it makes sense to keep the data structure as clearly describing the series of events that occurred (nodes responded, etc), rather than overfitting to this particular usage.

I would agree if we were locking ourselves into anything. Adding all of the responses is a ~1 line code change in trin, and is backward compatible on the front-end.

jacobkaufmann · 2023-03-16T18:36:59Z

here is a concrete example of the sort of thing I had in mind:

struct QueryPeer {
  requested_at: Instant,
  responded_at: Instant,
  response: QueryResponse,
}

enum QueryResponse {
  Peers(HashMap<Enr, QueryPeer>),
  Content,
}

struct QueryTrace {
  // originator of query (i.e. local node)
  origin: Enr,
  // target content ID.
  target: ContentId,
  // UTC timestamp for query start.
  started_at: Instant,
  // UTC timestamp for query end (termination).
  ended_at: Instant,
  // first level of peers in the map are the bootstrap peers, or those initially queried.
  trace: HashMap<Enr, QueryPeer>,
}

the structure could be modified to include peers who do not respond or use nested arrays instead of maps, but I just want to get the general idea across.

having said that, I'm okay moving forward with the existing structure. like @carver said, we can move forward with this special-purpose design until something more general is required. there are still some outstanding comments that I would like to see addressed though.

ogenev

IMO, we don't need two QueryTrace types - one in ethportal-api and one in query_trace.rs.

We can leave only the QueryTrace type in query_trace.rs and use some serde primitives not to serialize the started_at and target_id fields. We can also use the NodeId type from ethportal-api which will make implementing Serialize/Deserialize for this QueryTrace type trivial. Then we should re-export this type via ethportal-api for external users to access it.

In summary, this would look something like this:

Remove the QueryTrace type from ethportal-api.
Move QueryTrace from trin-core to trin-types.
Use ethportal-api::discv5::NodeId type instead of discv5::enr::NodeId in QueryTrace.
Implement Serialize/Deserialize for QueryTrace and add #[serde(skip_serializing)] attributes for started_at and target_id fields.
Import QueryTrace in ethportal-api from trin-types and re-export it.

ethportal-peertest/src/scenarios/find.rs

ethportal-api/src/types/query_trace.rs

trin-core/src/portalnet/find/query_trace.rs

ethportal-api/src/types/discv5.rs

ethportal-api/src/types/query_trace.rs

portalnet/src/utils/mod.rs

perama-v · 2023-04-13T13:38:23Z

Tested, looks good. Three minor notes:

One note is that the content is returned twice in the response. Perhaps the second received_content_from_node field could be skipped (#[serde(skip_serializing)])?

{
  "jsonrpc": "2.0",
  "result": {
    "content": "0x", // here
    "trace": {
      "received_content_from_node": null, // here
      "origin":

Another note is that the timestamp that the request was initialised is returned back to the caller. Perhaps this could also be skipped as the caller will always know when they started this request.

      "started_at": { // ?omit
        "secs_since_epoch": <unix secs>,
        "nanos_since_epoch": <nanos>
      },

The response differs from the (now outdated) response defined in book/src/developers/protocols/json_rpc.md.

mrferris · 2023-04-14T22:37:40Z

@perama-v

One note is that the content is returned twice in the response

The first is the content itself, whereas the second is optional ID of the node that returned the content.

Perhaps this could also be skipped as the caller will always know when they started this request.

Good point, this was previously skipped, but with the way that QueryTrace is now defined, a default value is required to skip that field. The SystemTime type we're using doesn't implement Default, so I'm going to make a note to possibly remove this in the future when we decide whether to keep using relative timestamps or use absolute timestamps.

carver

Just did a quick scan. Seems like we're well into the territory of this being an improvement from where we were, with no major setbacks. I think Ognyan's ❌ is outdated and can be disregarded now. I say we address any last things, and I saw you put issues in for follow-up work, so it's good to go. I'm excited to see the results in glados!

carver · 2023-04-15T00:31:34Z

ethportal-api/src/lib.rs

 // Re-exports jsonrpsee crate
 pub use jsonrpsee;
+
+pub use trin_types::discv5::*;


I'm personally not a fan of the flattening of this trin_types module structure, but that seems to already be the standard in this module, so it seems like the right choice, incrementally. Maybe it's a standard we can talk about in person at the upcoming summit.

carver · 2023-04-15T01:01:13Z

ethportal-peertest/src/scenarios/find.rs

+    let uniq_content_key =
+        "\"0x0015b11b918355b1ef9c5db810302ebad0bf2544255b530cdce90674d5887bb286\"";
+    let history_content_key: HistoryContentKey = serde_json::from_str(uniq_content_key).unwrap();


This feels like it has extra steps that I don't understand. Why go into a json-encoded string and back out?

I see some other places that do this same concept like this, like:

let content_key: HistoryContentKey = serde_json::from_value(json!(HISTORY_CONTENT_KEY)).unwrap();

which is a little better. But still, it seems like going in and out of json doesn't really do anything for us.

Something equivalent to this seems like the direct path that doesn't involve unnecessary json:

let key_bytes = "0015b11b918355b1ef9c5db810302ebad0bf2544255b530cdce90674d5887bb286"; let history_content_key = HistoryContentKey::from_bytes(hex::decode(key_bytes));

carver · 2023-04-15T01:17:29Z

portalnet/src/overlay_service.rs

        callback: Option<oneshot::Sender<FindContentResult>>,
+        is_trace: bool,
    ) -> Option<QueryId> {
+        info!("Starting query for content key: {}", target);


We might want to reduce this to debug or drop it altogether, if we notice it being too little noisy, but I'm pretty happy to let that evolve over time, and be just a little generous with logs, especially with new features.

carver · 2023-04-15T01:27:34Z

trin-history/src/jsonrpc.rs

+                let mut trace = QueryTrace::new(
+                    &self.network.overlay.local_enr(),
+                    NodeId::new(&content_key.content_id()).into(),
+                );
+                trace.node_responded_with_content(&local_enr);
+                (Some(val), if is_trace { Some(trace) } else { None })


Seems like a bummer to do all the trace work every time, even when it wasn't requested. Maybe something like:

Suggested change

let mut trace = QueryTrace::new(

&self.network.overlay.local_enr(),

NodeId::new(&content_key.content_id()).into(),

);

trace.node_responded_with_content(&local_enr);

(Some(val), if is_trace { Some(trace) } else { None })

let trace_option = if is_trace {

let mut trace = QueryTrace::new(

&self.network.overlay.local_enr(),

NodeId::new(&content_key.content_id()).into(),

);

trace.node_responded_with_content(&local_enr);

Some(trace)

} else { None }

(Some(val), trace_option)

trin-types/src/enr.rs

Adds response timestamps to tracing output Adds comments Adds timestamp for content found event Adds ENRs and distinction between no response and no progress Passes ENRs by reference Adds unit test Update peertest to parse trace output Add release notes Small cleanup of query_tracer.rs Add node metadata to trace De-duplicate ENRs and rename to QueryTrace Refactors node_responded_with to take a vec of all ENRs Adds test of node_metadata values Update ms->millis Update jsonrpc types & test Do not create or manage a QueryTrace for queries which don't require one Define and test ethportal-api QueryTrace type Use NodeId within trin_core::QueryTrace instead of String

mrferris force-pushed the full-tracing branch from 567c076 to e2f19dc Compare December 19, 2022 22:02

mrferris marked this pull request as ready for review December 19, 2022 22:33

mrferris force-pushed the full-tracing branch from 2896fcb to f6b8f5b Compare December 19, 2022 22:39

mrferris requested review from njgheorghita and ogenev December 20, 2022 16:27

mrferris mentioned this pull request Dec 21, 2022

Add audit details page and audit tracing ethereum/glados#28

Merged

3 tasks

mrferris requested a review from jacobkaufmann December 27, 2022 16:49

jacobkaufmann reviewed Jan 6, 2023

View reviewed changes

mrferris self-assigned this Feb 20, 2023

mrferris force-pushed the full-tracing branch 3 times, most recently from 8b6155d to 649eedb Compare February 21, 2023 23:02

perama-v approved these changes Feb 22, 2023

View reviewed changes

ogenev reviewed Feb 22, 2023

View reviewed changes

ethportal-api/src/types/portal.rs Outdated Show resolved Hide resolved

ogenev reviewed Feb 22, 2023

View reviewed changes

ethportal-peertest/src/scenarios/find.rs Outdated Show resolved Hide resolved

ogenev reviewed Feb 22, 2023

View reviewed changes

trin-core/src/portalnet/find/query_trace.rs Outdated Show resolved Hide resolved

ogenev reviewed Feb 22, 2023

View reviewed changes

trin-history/src/jsonrpc.rs Outdated Show resolved Hide resolved

ogenev suggested changes Feb 22, 2023

View reviewed changes

jacobkaufmann reviewed Mar 6, 2023

View reviewed changes

mrferris force-pushed the full-tracing branch from 2a3ab92 to d356eb7 Compare March 9, 2023 07:05

mrferris requested a review from ogenev March 9, 2023 07:12

mrferris changed the title ~~Full Content Query Tracing~~ Query Search Path Tracing Mar 16, 2023

mrferris changed the title ~~Query Search Path Tracing~~ Query Search Path Mar 16, 2023

mrferris force-pushed the full-tracing branch 2 times, most recently from 80d6b81 to c53b856 Compare March 25, 2023 01:51

ogenev reviewed Mar 27, 2023

View reviewed changes

ethportal-peertest/src/scenarios/find.rs Outdated Show resolved Hide resolved

ethportal-api/src/types/query_trace.rs Outdated Show resolved Hide resolved

trin-core/src/portalnet/find/query_trace.rs Show resolved Hide resolved

mrferris force-pushed the full-tracing branch 4 times, most recently from 812b062 to b4b9992 Compare April 12, 2023 21:43

mrferris requested a review from ogenev April 13, 2023 04:26

ogenev reviewed Apr 13, 2023

View reviewed changes

ethportal-api/src/types/discv5.rs Outdated Show resolved Hide resolved

ethportal-api/src/types/query_trace.rs Show resolved Hide resolved

ogenev reviewed Apr 13, 2023

View reviewed changes

portalnet/src/utils/mod.rs Show resolved Hide resolved

mrferris force-pushed the full-tracing branch from b4b9992 to 743a1c3 Compare April 14, 2023 22:25

mrferris force-pushed the full-tracing branch from 743a1c3 to 6caab18 Compare April 14, 2023 22:38

mrferris requested a review from carver April 14, 2023 22:51

carver approved these changes Apr 15, 2023

View reviewed changes

mrferris force-pushed the full-tracing branch from 6caab18 to 637e245 Compare April 19, 2023 19:49

mrferris force-pushed the full-tracing branch from 637e245 to 0141bed Compare April 19, 2023 20:04

mrferris merged commit 49e7dbe into ethereum:master Apr 19, 2023

kdeme mentioned this pull request Jul 13, 2023

Implement portal_historyTraceRecursiveFindContent JSON-RPC status-im/nimbus-eth1#1642

Closed

This was referenced Oct 12, 2023

implement portal_historyTraceRecursiveFindContent status-im/nimbus-eth1#1813

Merged

add spec for portal_historyTraceRecursiveFindContent ethereum/portal-network-specs#236

Merged

Conversation

mrferris commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was wrong?

How was it fixed?

To-Do

Uh oh!

jacobkaufmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perama-v left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogenev left a comment

Choose a reason for hiding this comment

Uh oh!

mrferris commented Feb 24, 2023

Uh oh!

mrferris commented Feb 24, 2023

Uh oh!

jacobkaufmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mrferris commented Mar 6, 2023

Uh oh!

jacobkaufmann commented Mar 6, 2023

Uh oh!

mrferris commented Mar 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrferris commented Mar 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carver commented Mar 14, 2023

Uh oh!

mrferris commented Mar 16, 2023

Uh oh!

jacobkaufmann commented Mar 16, 2023

Uh oh!

ogenev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

perama-v commented Apr 13, 2023

Uh oh!

mrferris commented Apr 14, 2023

Uh oh!

carver left a comment

Choose a reason for hiding this comment

Uh oh!

carver Apr 15, 2023

Choose a reason for hiding this comment

Uh oh!

carver Apr 15, 2023

Choose a reason for hiding this comment

Uh oh!

carver Apr 15, 2023

Choose a reason for hiding this comment

Uh oh!

carver Apr 15, 2023

Choose a reason for hiding this comment

mrferris commented Nov 24, 2022 •

edited

Loading

mrferris commented Mar 10, 2023 •

edited

Loading

mrferris commented Mar 10, 2023 •

edited

Loading