Refactor the HeaderChain into BlockTree#324
Conversation
725b066 to
e895611
Compare
|
More details in case future documentation covers why things are done this way: #324 introduces a "tree" structure. The model itself is a directed acyclic graph, but the resulting graph would look like a tree with potentially multiple leaves but only a single terminating root. Thinking in terms of a graph, each header and associated metadata are a "node", and block hashes are edges. To traverse the graph, you may only follow the previous hash of whatever node you are at. In practice this can be stored as a HashMap, where each "edge" is a key, the block hash, and each "node" is the header, height, accumulated PoW, and other metadata. To determine the next edge to visit, one would just check the previous hash of the header. There is no natural order in a hashmap, so a "tip" structure is used to determine what edge to begin with for a particular chain. The chain with the most work is stored as the active tip. The tip also stores the expected work for the next header in the chain. Properties:
It is still nice to have to be able to index by height for: difficulty adjustments,
|
edc5cd8 to
bb9f965
Compare
| self.0 as u64 % params.as_ref().difficulty_adjustment_interval() == 0 | ||
| } | ||
|
|
||
| pub(crate) fn to_u32(self) -> u32 { |
There was a problem hiding this comment.
I am not sure I fully grasp the subtle API tradeoffs yet, but any reason to not use a standard From impl here like:
impl From<Height> for u32 {
fn from(height: Height) -> Self {
height.0
}
}
There was a problem hiding this comment.
If I'm not mistaken this wouldn't enable the impl Into<Height> message signature because it is not clear how to go from a u32 to a Height as far as the compiler is concerned
There was a problem hiding this comment.
Isn't impl Into<Height> handled by the existing From<u32> impl?
impl From<u32> for Height {
fn from(value: u32) -> Self {
Height(value)
}
}
There was a problem hiding this comment.
Oh I didn't read what snippet this was above. I like forcing the caller to know what they are converting to here. "into" could apply to both 32 and 64 bits at the callsite
There was a problem hiding this comment.
Ok, so just so I understand, you want to avoid a scenario where the caller by accident converts the Height into a u64? I am not sure I follow though, does impl From<Height> for u32 also work with u64? I assumed rust would still require that to be explicit?
There was a problem hiding this comment.
I extrapolated the suggestion into also implementing for From<Height> for u64, as there is a method to do that with to_u64. I would think it strange to have a to_u64 but no to_u32 because you are supposed to use into() to get your u32. I think you should be able to convert to either one, but the call should be explicit as to which one
I think after this PR there can be some discussion of either:
- promoting this type across the crate and making it public
- substituting this type with
absolute::Heightfrombitcoinand using an extension trait to do useful things toHeight.
If 1. is the conclusion, then we can consider what exactly the API should look like to public users
There was a problem hiding this comment.
Oh whoops, missed the other u64 method. I guess having both implemented would force the caller to be explicit, like with turbofish: let value = height.into::<u32>();, which kinda accomplishes the same goal for being explicit with the functions? Not a big deal, was just curios on the api choice.
There was a problem hiding this comment.
Good point, I want to do a bit of research on absolute::Height or see if we should iron this one out for public use. For this PR it does the job
| Height(height) | ||
| } | ||
|
|
||
| fn from_u64_checked(height: u64) -> Option<Self> { |
There was a problem hiding this comment.
Similar to my other API question, can this use a TryFrom<u64> for Height?
There was a problem hiding this comment.
Yeah, maybe a bit contrived, but since it would be mapped with try_into().ok() and it resembles the checked API I went with an optional.
I discussed with valuedmammal from the BDK team, and it might make sense to actually use the absolute::Height from bitcoin, but we would like have to add an extension trait to do the nice things this Height does. I am punting on it for a future PR where we've thought through what exactly makes sense here.
| canonical_hashes: BTreeMap::new(), | ||
| headers: HashMap::with_capacity(20_000), | ||
| active_tip: tip, | ||
| candidate_forks: Vec::with_capacity(2), |
There was a problem hiding this comment.
Is 2 just kinda a feel good number here? Just wondering if it was influenced by something.
There was a problem hiding this comment.
Yeah, somewhat of a guess because I think the default might be something like 8 or 16 which is more than I would expect to see. Maybe a bit pedantic of me here
| } else { | ||
| self.active_tip.next_work_required | ||
| }; | ||
| if let Some(work) = next_work { |
There was a problem hiding this comment.
Any thoughts on being defensive and validating the actual pow here? I see it happening in Chain's sanity check which also makes sense, just feels a little naked not seeing it here.
There was a problem hiding this comment.
I guess there would be no harm in it but it would introduce a new variant on the rejected reasoning. I think it is okay for the HeaderBatch to still have some basic role in sanity checking just to get the simple checks out of the way. The function body is already pretty large.
There was a problem hiding this comment.
I think this is kind of inline with my pub(crate) concerns. pub(crate) pokes a hole in the visibility red tape so that the item can be called from anywhere, but in this case, it probably should only be called by Chain since it is relying on the validation at that level.
There was a problem hiding this comment.
So you mean to restrict this to "mod" visibility? I think that makes sense if that's what you mean. Otherwise not sure if I follow
There was a problem hiding this comment.
Yea, the two-plane visibility rules of rust (and how pub(crate) overrides both) is confusing. I think this is a little extra confusing because we have the chain module with a child module also called chain.
But at a very high level, this is how I view the hierarchy:
+--> chain ---> graph (holding the BlockTree type)
|
|
node ---+
|
|
+--> network/peers
In my mind, the graph module should just default to private and have normal public items which only the parent chain module would have access too. And the chain module can be the interface for node.
In order to avoid some big visibility refactor though, pub(chain) (I think technically the parent chain module or the sibling would work...but haven't tested that) is perhaps kind of equivalent in the current layout?
There was a problem hiding this comment.
Sounds like there is a greater discussion to be had about visibility. For now I'm not sure its really a bleeding problem, but if we reach some sort of protocol on what to scope visibility at given a type's role, then we can apply that across the crate. As an aside I plan on changing the name of chain.rs to something more like manager.rs, which is ultimately what that file will amount to
There was a problem hiding this comment.
Probably makes it easier during refactoring, but the heavy use of pub(crate) makes it difficult to parse what exactly is the interface of the module. Given the module itself has pub(crate) visibility, might be nice to take a pass at this later and clean up what needs to be exposed to a minimum. Since this is wrapped by the chain module, feels like that is where more thought could be put into the api.
There was a problem hiding this comment.
I lean to the explicit pub(crate) because it makes it very clear this is not something one would have to worry about breaking. Just another crate preference that is up to interpretation I think
|
Concept ACK |
|
Found a bug. In This patch was doing that, but it was only re-initializing the The fix is simple, we just load in all the headers we can after that prev hash we found. Failures to do this load are reported but don't immediately kill the node. I think that should be up to the user, especially since the database just successfully fetched the header at the height we were looking for. --- a/src/chain/chain.rs
+++ b/src/chain/chain.rs
@@ -335,7 +335,15 @@ impl<H: HeaderStore> Chain<H> {
- self.header_chain.accept_header(header);
+ drop(db);
+ if let Err(e) = self.load_headers().await {
+ crate::log!(self.dialog,
+ "Failure when attempting to fetch previous headers while syncing"
+ );
+ self.dialog.send_warning(Warning::FailedPersistence {
+ warning: format!("Persistence failure: {e}"),
+ });
+ }Note that this branch of logic was almost tested by |
|
Going to self-ACK because I have reviewed the rest thoroughly and there is a lot to be done before next release |
Partially works towards #323
Indexing the chain of headers by height and evaluating new headers in a batch is convenient at first, but that approach has lead to poorly implemented audits for difficulty adjustments, finding headers by their hash, and poor interoperability with storage APIs. Of course, basic sanity checks can be ran on a headers message to assert all the headers are connected, pass their own proof of work, etc. This PR does a couple of things, but one of them is to evaluate each header individually for it's relationship to other headers in the chain and if it follows the difficulty adjustment. This approach is extended to the storage backend, where the trait now either 1. accepts a new header 2. handles a reorganization by taking the newly accepted blocks and reorged blocks.
These changes are made far easier with a more robust representation of the header chain. Right now only the chain of most work is stored in memory, and reorganizations only occur if a node provides a message with more work. This representation does not take advantage of the inherent properties of the header chain. If the data structure holds a directed graph of headers, it is trivial to navigate down the graph from the tip of the chain. By storing block data with its associated hash, the directed nature of the blockchain is preserved and accesses to data by hash are
O(1). TheBlockTreeis the implementation of this philosophy, storing both the active chain tip as well as candidate forks, and a hashmap of the block hash to the blocks header and other metadata. Accessing data by height is still valuable, but a chain may have multiple headers at a single height. In theBlockTreeonly the heights that are considered to be in the chain of most work are stored. When a fork accumulates more work than the current chain, the heights are updated accordingly. This is the most complicated function of theBlockTree,switch_to_fork. All other methods should be straightforward.This approach differs from the previous, where header chains of equal work are rejected. Due to that compromise, a few tests are removed temporarily. No integration tests are effected, however, which is a decently strong indication the implementation works as expected for most conditions
There will be a lot of follow up to this in terms of the
Chainstructure. Because each block hash now contains extra metadata, all of the logic within both theCFHeaderChainandFilterChaincan be consolidated into theBlockTree