diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 5bc2f6965a7..1717118e5e2 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -7,6 +7,4 @@ # global owners are only requested if there isn't a more specific # codeowner specified below. For this reason, the global codeowners # are often repeated in package-level definitions. -* @cometbft/engineering @cometbft/devrel @cometbft/interchain-inc - -/spec @cometbft/research @cometbft/engineering @cometbft/interchain-inc +* @celestiaorg/celestia-core \ No newline at end of file diff --git a/.github/auto_request.yml b/.github/auto_request.yml new file mode 100644 index 00000000000..0eb43a98e19 --- /dev/null +++ b/.github/auto_request.yml @@ -0,0 +1,15 @@ + +# More info at https://github.com/necojackarc/auto-request-review +reviewers: + defaults: + - cmwaters + - evan-forbes + - ninabarbakadze + - rach-id + - rootulp + +options: + ignore_draft: true + ignored_keywords: + - DO NOT REVIEW + enable_group_assignment: false diff --git a/README.md b/README.md index ca1b4462ad3..4fde4f71633 100644 --- a/README.md +++ b/README.md @@ -1,200 +1,86 @@ -# CometBFT +# celestia-core -[Byzantine-Fault Tolerant][bft] [State Machine Replication][smr]. Or -[Blockchain], for short. +[![Go Reference](https://img.shields.io/badge/godoc-reference-blue.svg)](https://pkg.go.dev/github.com/celestiaorg/celestia-core) +[![GitHub Release](https://img.shields.io/github/v/release/celestiaorg/celestia-core)](https://github.com/celestiaorg/celestia-core/releases/latest) +[![Go Report Card](https://goreportcard.com/badge/github.com/celestiaorg/celestia-core)](https://goreportcard.com/report/github.com/celestiaorg/celestia-core) +[![Lint](https://github.com/celestiaorg/celestia-core/actions/workflows/lint.yml/badge.svg)](https://github.com/celestiaorg/celestia-core/actions/workflows/lint.yml) +[![Tests](https://github.com/celestiaorg/celestia-core/actions/workflows/tests.yml/badge.svg)](https://github.com/celestiaorg/celestia-core/actions/workflows/tests.yml) -[![Version][version-badge]][version-url] -[![API Reference][api-badge]][api-url] -[![Go version][go-badge]][go-url] -[![Discord chat][discord-badge]][discord-url] -[![License][license-badge]][license-url] -[![Sourcegraph][sg-badge]][sg-url] +celestia-core is a fork of [cometbft/cometbft](https://github.com/cometbft/cometbft), an implementation of the Tendermint protocol, with the following changes: + -| Branch | Tests | Linting | -|---------|------------------------------------------------|---------------------------------------------| -| main | [![Tests][tests-badge]][tests-url] | [![Lint][lint-badge]][lint-url] | -| v0.38.x | [![Tests][tests-badge-v038x]][tests-url-v038x] | [![Lint][lint-badge-v038x]][lint-url-v038x] | -| v0.37.x | [![Tests][tests-badge-v037x]][tests-url-v037x] | [![Lint][lint-badge-v037x]][lint-url-v037x] | -| v0.34.x | [![Tests][tests-badge-v034x]][tests-url-v034x] | [![Lint][lint-badge-v034x]][lint-url-v034x] | +1. Modifications to how `DataHash` in the block header is determined. In CometBFT, `DataHash` is based on the transactions included in a block. In Celestia, block data (including transactions) are erasure coded into a data square to enable data availability sampling. In order for the header to contain a commitment to this data square, `DataHash` was modified to be the Merkle root of the row and column roots of the erasure coded data square. See [ADR 008](https://github.com/celestiaorg/celestia-core/blob/v0.34.x-celestia/docs/celestia-architecture/adr-008-updating-to-tendermint-v0.35.x.md?plain=1#L20) for the motivation or [celestia-app/pkg/da/data_availability_header.go](https://github.com/celestiaorg/celestia-app/blob/2f89956b22c4c3cfdec19b3b8601095af6f69804/pkg/da/data_availability_header.go) for the implementation. Note on the implementation: celestia-app computes the hash in prepare_proposal and returns it to CometBFT via [`blockData.Hash`](https://github.com/celestiaorg/celestia-app/blob/5bbdac2d3f46662a34b2111602b8f964d6e6fba5/app/prepare_proposal.go#L78) so a modification to celestia-core isn't strictly necessary but [comments](https://github.com/celestiaorg/celestia-core/blob/2ec23f804691afc196d0104616e6c880d4c1ca41/types/block.go#L1041-L1042) were added. -CometBFT is a Byzantine Fault Tolerant (BFT) middleware that takes a -state transition machine - written in any programming language - and securely -replicates it on many machines. -It is a fork of [Tendermint Core][tm-core] and implements the Tendermint -consensus algorithm. +See [./docs/celestia-architecture](./docs/celestia-architecture/) for architecture decision records (ADRs) on Celestia modifications. -For protocol details, refer to the [CometBFT Specification](./spec/README.md). +## Diagram -For detailed analysis of the consensus protocol, including safety and liveness -proofs, read our paper, "[The latest gossip on BFT -consensus](https://arxiv.org/abs/1807.04938)". +```ascii + ^ +-------------------------------+ ^ + | | | | + | | State-machine = Application | | + | | | | celestia-app (built with Cosmos SDK) + | | ^ + | | + | +----------- | ABCI | ----------+ v +Celestia | | + v | ^ +validator or | | | | +full consensus | | Consensus | | +node | | | | + | +-------------------------------+ | celestia-core (fork of CometBFT) + | | | | + | | Networking | | + | | | | + v +-------------------------------+ v +``` -## Documentation +## Install -Complete documentation can be found on the -[website](https://docs.cometbft.com/). +See -## Releases +## Usage -Please do not depend on `main` as your production branch. Use -[releases](https://github.com/cometbft/cometbft/releases) instead. +See -If you intend to run CometBFT in production, we're happy to help. To contact -us, in order of preference: - -- [Create a new discussion on - GitHub](https://github.com/cometbft/cometbft/discussions) -- Reach out to us via [Telegram](https://t.me/CometBFT) -- [Join the Cosmos Network Discord](https://discord.gg/interchain) and - discuss in - [`#cometbft`](https://discord.com/channels/669268347736686612/1069933855307472906) - -More on how releases are conducted can be found [here](./RELEASES.md). - -## Security +## Contributing -To report a security vulnerability, see our [bug bounty -program](https://hackerone.com/cosmos). For examples of the kinds of bugs we're -looking for, see [our security policy](SECURITY.md). +This repo intends on preserving the minimal possible diff with [cometbft/cometbft](https://github.com/cometbft/cometbft) to make fetching upstream changes easy. If the proposed contribution is -## Minimum requirements +- **specific to Celestia**: consider if [celestia-app](https://github.com/celestiaorg/celestia-app) is a better target +- **not specific to Celestia**: consider making the contribution upstream in CometBFT -| CometBFT version | Requirement | Notes | -|------------------|-------------|-------------------| -| main | Go version | Go 1.22 or higher | -| v0.38.x | Go version | Go 1.22 or higher | -| v0.37.x | Go version | Go 1.22 or higher | -| v0.34.x | Go version | Go 1.12 or higher | +1. [Install Go](https://go.dev/doc/install) 1.23.5+ +2. Fork this repo +3. Clone your fork +4. Find an issue to work on (see [good first issues](https://github.com/celestiaorg/celestia-core/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)) +5. Work on a change in a branch on your fork +6. When your change is ready, push your branch and create a PR that targets this repo -### Install +### Helpful Commands -See the [install guide](./docs/guides/install.md). +```sh +# Build a new CometBFT binary and output to build/comet +make build -### Quick Start +# Install CometBFT binary +make install -- [Single node](./docs/guides/quick-start.md) -- [Local cluster using docker-compose](./docs/networks/docker-compose.md) +# Run tests +make test -## Contributing +# If you modified any protobuf definitions in a `*.proto` file then +# you may need to lint, format, and generate updated `*.pb.go` files +make proto-lint +make proto-format +make proto-gen +``` -Please abide by the [Code of Conduct](CODE_OF_CONDUCT.md) in all interactions. +## Branches -Before contributing to the project, please take a look at the [contributing -guidelines](CONTRIBUTING.md) and the [style guide](STYLE_GUIDE.md). You may also -find it helpful to read the [specifications](./spec/README.md), and familiarize -yourself with our [Architectural Decision Records -(ADRs)](./docs/architecture/README.md) and [Request For Comments -(RFCs)](./docs/rfc/README.md). +The canonical branches in this repo are based on CometBFT releases. For example: [`v0.38.x-celestia`](https://github.com/celestiaorg/celestia-core/tree/v1.x-celestia) is based on the CometBFT `v0.38.x` release branch and contains Celestia-specific changes. ## Versioning -### Semantic Versioning - -CometBFT uses [Semantic Versioning](http://semver.org/) to determine when and -how the version changes. According to SemVer, anything in the public API can -change at any time before version 1.0.0 - -To provide some stability to users of 0.X.X versions of CometBFT, the MINOR -version is used to signal breaking changes across CometBFT's API. This API -includes all publicly exposed types, functions, and methods in non-internal Go -packages as well as the types and methods accessible via the CometBFT RPC -interface. - -Breaking changes to these public APIs will be documented in the CHANGELOG. - -### Upgrades - -In an effort to avoid accumulating technical debt prior to 1.0.0, we do not -guarantee that breaking changes (i.e. bumps in the MINOR version) will work with -existing CometBFT blockchains. In these cases you will have to start a new -blockchain, or write something custom to get the old data into the new chain. -However, any bump in the PATCH version should be compatible with existing -blockchain histories. - -For more information on upgrading, see [UPGRADING.md](./UPGRADING.md). - -### Supported Versions - -Because we are a small core team, we have limited capacity to ship patch -updates, including security updates. Consequently, we strongly recommend keeping -CometBFT up-to-date. Upgrading instructions can be found in -[UPGRADING.md](./UPGRADING.md). - -Currently supported versions include: - -- v0.38.x: CometBFT v0.38 introduces ABCI 2.0, which implements the entirety of - ABCI++ -- v0.37.x: CometBFT v0.37 introduces ABCI 1.0, which is the first major step - towards the full ABCI++ implementation in ABCI 2.0 -- v0.34.x: The CometBFT v0.34 series is compatible with the Tendermint Core - v0.34 series - -## Resources - -### Libraries - -- [Cosmos SDK](http://github.com/cosmos/cosmos-sdk); A framework for building - applications in Golang -- [Tendermint in Rust](https://github.com/informalsystems/tendermint-rs) -- [ABCI Tower](https://github.com/penumbra-zone/tower-abci) - -### Applications - -- [Cosmos Hub](https://hub.cosmos.network/) -- [Terra](https://www.terra.money/) -- [Celestia](https://celestia.org/) -- [Anoma](https://anoma.network/) -- [Vocdoni](https://docs.vocdoni.io/) - -### Research - -Below are links to the original Tendermint consensus algorithm and relevant -whitepapers which CometBFT will continue to build on. - -- [The latest gossip on BFT consensus](https://arxiv.org/abs/1807.04938) -- [Master's Thesis on Tendermint](https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9769) -- [Original Whitepaper: "Tendermint: Consensus Without Mining"](https://tendermint.com/static/docs/tendermint.pdf) - -## Join us - -CometBFT is currently maintained by [Informal -Systems](https://informal.systems). If you'd like to work full-time on CometBFT, -[we're hiring](https://informal.systems/careers)! - -Funding for CometBFT development comes primarily from the [Interchain -Foundation](https://interchain.io), a Swiss non-profit. Informal Systems also -maintains [cometbft.com](https://cometbft.com). - -[bft]: https://en.wikipedia.org/wiki/Byzantine_fault_tolerance -[smr]: https://en.wikipedia.org/wiki/State_machine_replication -[Blockchain]: https://en.wikipedia.org/wiki/Blockchain -[version-badge]: https://img.shields.io/github/v/release/cometbft/cometbft.svg -[version-url]: https://github.com/cometbft/cometbft/releases/latest -[api-badge]: https://camo.githubusercontent.com/915b7be44ada53c290eb157634330494ebe3e30a/68747470733a2f2f676f646f632e6f72672f6769746875622e636f6d2f676f6c616e672f6764646f3f7374617475732e737667 -[api-url]: https://pkg.go.dev/github.com/cometbft/cometbft -[go-badge]: https://img.shields.io/badge/go-1.22-blue.svg -[go-url]: https://github.com/moovweb/gvm -[discord-badge]: https://img.shields.io/discord/669268347736686612.svg -[discord-url]: https://discord.gg/interchain -[license-badge]: https://img.shields.io/github/license/cometbft/cometbft.svg -[license-url]: https://github.com/cometbft/cometbft/blob/main/LICENSE -[sg-badge]: https://sourcegraph.com/github.com/cometbft/cometbft/-/badge.svg -[sg-url]: https://sourcegraph.com/github.com/cometbft/cometbft?badge -[tests-url]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml -[tests-url-v038x]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml?query=branch%3Av0.38.x -[tests-url-v037x]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml?query=branch%3Av0.37.x -[tests-url-v034x]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml?query=branch%3Av0.34.x -[tests-badge]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml/badge.svg?branch=main -[tests-badge-v038x]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml/badge.svg?branch=v0.38.x -[tests-badge-v037x]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml/badge.svg?branch=v0.37.x -[tests-badge-v034x]: https://github.com/cometbft/cometbft/actions/workflows/tests.yml/badge.svg?branch=v0.34.x -[lint-badge]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml/badge.svg?branch=main -[lint-badge-v034x]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml/badge.svg?branch=v0.34.x -[lint-badge-v037x]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml/badge.svg?branch=v0.37.x -[lint-badge-v038x]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml/badge.svg?branch=v0.38.x -[lint-url]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml -[lint-url-v034x]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml?query=branch%3Av0.34.x -[lint-url-v037x]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml?query=branch%3Av0.37.x -[lint-url-v038x]: https://github.com/cometbft/cometbft/actions/workflows/lint.yml?query=branch%3Av0.38.x -[tm-core]: https://github.com/tendermint/tendermint +Releases are formatted: `v-tm-v` +For example: [`v1.4.0-tm-v0.34.20`](https://github.com/celestiaorg/celestia-core/releases/tag/v1.4.0-tm-v0.34.20) is celestia-core version `1.4.0` based on CometBFT `0.34.20`. +`CELESTIA_CORE_VERSION` strives to adhere to [Semantic Versioning](http://semver.org/). diff --git a/docs/celestia-architecture/README.md b/docs/celestia-architecture/README.md new file mode 100644 index 00000000000..f48698d1fa1 --- /dev/null +++ b/docs/celestia-architecture/README.md @@ -0,0 +1,63 @@ +--- +order: 1 +parent: + order: false +--- + +# Tendermint and Celestia + +celestia-core is not meant to be used as a general purpose framework. +Instead, its main purpose is to provide certain components (mainly consensus but also a p2p layer for Tx gossiping) for the Celestia main chain. +Hence, we do not provide any extensive documentation here. + +Instead of keeping a copy of the Tendermint documentation, we refer to the existing extensive and maintained documentation and specification: + +- +- +- + +Reading these will give you a lot of background and context on Tendermint which will also help you understand how celestia-core and [celestia-app](https://github.com/celestiaorg/celestia-app) interact with each other. + +## Celestia + +As mentioned above, celestia-core aims to be more focused on the Celestia use-case than vanilla Tendermint. +Moving forward we might provide a clear overview on the changes we incorporated. +For now, we refer to the Celestia specific ADRs in this repository as well as to the Celestia specification: + +- [celestia-specs](https://github.com/celestiaorg/celestia-specs) + +## Architecture Decision Records (ADR) + +This is a location to record all high-level architecture decisions in this repository. + +You can read more about the ADR concept in this [blog post](https://product.reverb.com/documenting-architecture-decisions-the-reverb-way-a3563bb24bd0#.78xhdix6t). + +An ADR should provide: + +- Context on the relevant goals and the current state +- Proposed changes to achieve the goals +- Summary of pros and cons +- References +- Changelog + +Note the distinction between an ADR and a spec. The ADR provides the context, intuition, reasoning, and +justification for a change in architecture, or for the architecture of something +new. The spec is much more compressed and streamlined summary of everything as +it stands today. + +If recorded decisions turned out to be lacking, convene a discussion, record the new decisions here, and then modify the code to match. + +Note the context/background should be written in the present tense. + +To start a new ADR, you can use this template: [adr-template.md](./adr-template.md) + +### Table of Contents + +- [ADR 001: Erasure Coding Block Propagation](./adr-001-block-propagation.md) +- [ADR 002: Sampling erasure coded Block chunks](./adr-002-ipld-da-sampling.md) +- [ADR 003: Retrieving Application messages](./adr-003-application-data-retrieval.md) +- [ADR 004: Data Availability Sampling Light Client](./adr-004-mvp-light-client.md) +- [ADR 005: Decouple BlockID and PartSetHeader](./adr-005-decouple-blockid-and-partsetheader.md) +- [ADR 006: Row Propagation](./adr-006-row-propagation.md) +- [ADR 007: Minimal Changes to Tendermint](./adr-007-minimal-changes-to-tendermint.md) +- [ADR 008: Updating to Tendermint v0.35.x](./adr-008-updating-to-tendermint-v0.35.x.md) diff --git a/docs/celestia-architecture/adr-001-block-propagation.md b/docs/celestia-architecture/adr-001-block-propagation.md new file mode 100644 index 00000000000..80933fd1f63 --- /dev/null +++ b/docs/celestia-architecture/adr-001-block-propagation.md @@ -0,0 +1,124 @@ +# ADR 001: Erasure Coding Block Propagation + +## Changelog + +- 16-2-2021: Created + +## Context + +Block propagation is currently done by splitting the block into arbitrary chunks and gossiping them to validators via a gossip routine. While this does not have downsides it does not meet the needs of the Celestia chain. The celestia chain requires blocks to be encoded in a different way and for the proposer to not propagate the chunks to peers. + +Celestia wants validators to pull the block from a IPFS network. What does this mean? As I touched on earlier the proposer pushes the block to the network, this in turn means that each validator downloads and reconstructs the block each time to verify it. Instead Celestia will encode and split up the block via erasure codes, stored locally in the nodes IPFS daemon. After the proposer has sent the block to IPFS and received the CIDs it will include them into the proposal. This proposal will be gossiped to other validators, once a validator receives the proposal it will begin requesting the CIDs included in the proposal. + +There are two forms of a validator, one that downloads the block and one that samples it. What does sampling mean? Sampling is the act of checking that a portion or entire block is available for download. + +## Detailed Design + +The proposed design is as follows. + +### Types + +The proposal and vote types have a BlockID, this will be replaced with a header hash. The proposal will contain add fields. + +The current proposal will be updated to include required fields. The entirety of the message will be reworked at a later date. To see the extent of the needed changes you can visit the [spec repo](https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/proto/consensus.proto#L19) + +```proto +message Proposal { + SignedMsgType type = 1; + int64 height = 2; + int32 round = 3; + int32 pol_round = 4; + + +++ + // 32-byte hash + bytes last_header_hash = 5; + // 32-byte hash + bytes last_commit_hash = 6; + // 32-byte hash + bytes consensus_root = 7; + FeeHeader fee_header = 8; + // 32-byte hash + bytes state_commitment = 9; + uint64 available_data_original_shares_used = 10; + AvailableDataHeader available_data_header = 11; + +++ + + google.protobuf.Timestamp timestamp = 12 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + bytes signature = 12; +} +``` + +```proto +// Vote represents a prevote, precommit, or commit vote from validators for +// consensus. +message Vote { + SignedMsgType type = 1; + int64 height = 2; + int32 round = 3; + +++ + bytes header_hash = 4; + +++ + google.protobuf.Timestamp timestamp = 5 + [(gogoproto.nullable) = false, (gogoproto.stdtime) = true]; + bytes validator_address = 6; + int32 validator_index = 7; + bytes signature = 8; +} +``` + +See [specs](https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/data_structures.md#vote) for more details on the vote. + +### Disk Storage + +Currently celestia-core stores all blocks in its store. Going forward only the headers of the blocks within the unbonding period will be stored. This will drastically reduce the amount of storage required by a celestia-core node. After the unbonding period all headers will have the option of being pruned. + +Proposed amendment to `BlockStore` interface + +```go +type BlockStore interface { + Base() int64 + Height() int64 + Size() int64 + + LoadBlockMeta(height int64) *types.BlockMeta + LoadHeader(height int64) *types.Header + LoadDAHeader(height int64) *types.DataAvailabilityHeader + + SaveHeaders(header *types.Header, daHeader *types.DataAvailabilityHeader, seenCommit *types.Commit) + + PruneHeaders(height int64) (uint64, error) + + LoadBlockCommit(height int64) *types.Commit + LoadSeenCommit(height int64) *types.Commit +} +``` + +Along side these changes the rpc layer will need to change. Instead of querying the LL-core store, the node will redirect the query through IPFS. + +Example: + +When a user requests a block from the LL node, the request will be set to the IPLD plugin. If the IPLD does not have the requested block, it will make a request to the celestia IPFS network for the required CIDs. If the full node does not have the DAheader they will not be able to request the block data. + +![user request flow](./assets/user-request.png) + +The goal is to not change the public interface for RPC's. It is yet to be seen if this is possible. This means that CIDs will need to be set and loaded from the store in order to get all the related block information a user requires. + +## Status + +Proposed + + +### Positive + +- Minimal breakage to public interface +- Only store the block in a single place (IPFS) +- Reduce the public interface of the storage within Celestia. + +### Negative + +- User requests may take more time to process + +### Neutral + +## References diff --git a/docs/celestia-architecture/adr-002-ipld-da-sampling.md b/docs/celestia-architecture/adr-002-ipld-da-sampling.md new file mode 100644 index 00000000000..a2d6cf987f3 --- /dev/null +++ b/docs/celestia-architecture/adr-002-ipld-da-sampling.md @@ -0,0 +1,280 @@ +# ADR 002: Sampling erasure coded Block chunks + +## Changelog + +- 26-2-2021: Created + +## Context + +In Tendermint's block gossiping each peer gossips random parts of block data to peers. +For Celestia, we need nodes (from light-clients to validators) to be able to sample row-/column-chunks of the erasure coded +block (aka the extended data square) from the network. +This is necessary for Data Availability proofs. + +![extended_square.png](img/extended_square.png) + +A high-level, implementation-independent formalization of above mentioned sampling and Data Availability proofs can be found in: +[_Fraud and Data Availability Proofs: Detecting Invalid Blocks in Light Clients_](https://fc21.ifca.ai/papers/83.pdf). + +For the time being, besides the academic paper, no other formalization or specification of the protocol exists. +Currently, the Celestia specification itself only describes the [erasure coding](https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/data_structures.md#erasure-coding) +and how to construct the extended data square from the block data. + +This ADR: +- describes the high-level requirements +- defines the API that and how it can be used by different components of Celestia (block gossiping, block sync, DA proofs) +- documents decision on how to implement this. + + +The core data structures and the erasure coding of the block are already implemented in celestia-core ([#17], [#19], [#83]). +While there are no ADRs for these changes, we can refer to the Celestia specification in this case. +For this aspect, the existing implementation and specification should already be on par for the most part. +The exact arrangement of the data as described in this [rationale document](https://github.com/celestiaorg/celestia-specs/blob/master/src/rationale/message_block_layout.md) +in the specification can happen at app-side of the ABCI boundary. +The latter was implemented in [celestiaorg/celestia-app#21](https://github.com/celestiaorg/celestia-app/pull/21) +leveraging a new ABCI method, added in [#110](https://github.com/celestiaorg/celestia-core/pull/110). +This new method is a sub-set of the proposed ABCI changes aka [ABCI++](https://github.com/tendermint/spec/pull/254). + +Mustafa Al-Bassam (@musalbas) implemented a [prototype](https://github.com/celestiaorg/celestia-prototype) +whose main purpose is to realistically analyse the protocol. +Although the prototype does not make any network requests and only operates locally, it can partly serve as a reference implementation. +It uses the [rsmt2d] library. + +The implementation will essentially use IPFS' APIs. For reading (and writing) chunks it +will use the IPLD [`DagService`](https://github.com/ipfs/go-ipld-format/blob/d2e09424ddee0d7e696d01143318d32d0fb1ae63/merkledag.go#L54), +more precisely the [`NodeGetter`](https://github.com/ipfs/go-ipld-format/blob/d2e09424ddee0d7e696d01143318d32d0fb1ae63/merkledag.go#L18-L27) +and [`NodeAdder`](https://github.com/ipfs/go-ipld-format/blob/d2e09424ddee0d7e696d01143318d32d0fb1ae63/merkledag.go#L29-L39). +As an optimization, we can also use a [`Batch`](https://github.com/ipfs/go-ipld-format/blob/d2e09424ddee0d7e696d01143318d32d0fb1ae63/batch.go#L29) +to batch adding and removing nodes. +This will be achieved by passing around a [CoreAPI](https://github.com/ipfs/interface-go-ipfs-core/blob/b935dfe5375eac7ea3c65b14b3f9a0242861d0b3/coreapi.go#L15) +object, which derives from the IPFS node which is created along with a tendermint node (see [#152]). +This code snippet does exactly that (see the [go-ipfs documentation] for more examples): +```go +// This constructs an IPFS node instance +node, _ := core.NewNode(ctx, nodeOptions) +// This attaches the Core API to the constructed node +coreApi := coreapi.NewCoreAPI(node) +``` + +The above mentioned IPLD methods operate on so called [ipld.Nodes]. +When computing the data root, we can pass in a [`NodeVisitor`](https://github.com/celestia/nmt/blob/b22170d6f23796a186c07e87e4ef9856282ffd1a/nmt.go#L22) +into the Namespaced Merkle Tree library to create these (each inner- and leaf-node in the tree becomes an ipld node). +As a peer that requests such an IPLD node, the Celestia IPLD plugin provides the [function](https://github.com/celestiaorg/celestia-core/blob/ceb881a177b6a4a7e456c7c4ab1dd0eb2b263066/p2p/ipld/plugin/nodes/nodes.go#L175) +`NmtNodeParser` to transform the retrieved raw data back into an `ipld.Node`. + +A more high-level description on the changes required to rip out the current block gossiping routine, +including changes to block storage-, RPC-layer, and potential changes to reactors is either handled in [ADR 001](./adr-001-block-propagation.md), +and/or in a few smaller, separate followup ADRs. + +## Alternative Approaches + +Instead of creating a full IPFS node object and passing it around as explained above + - use API (http) + - use ipld-light + - use alternative client + +Also, for better performance + - use [graph-sync], [IPLD selectors], e.g. via [ipld-prime] + +Also, there is the idea, that nodes only receive the [Header] with the data root only +and, in an additional step/request, download the DA header using the library, too. +While this feature is not considered here, and we assume each node that uses this library has the DA header, this assumption +is likely to change when flesh out other parts of the system in more detail. +Note that this also means that light clients would still need to validate that the data root and merkelizing the DA header yield the same result. + +## Decision + +> This section records the decision that was made. +> It is best to record as much info as possible from the discussion that happened. This aids in not having to go back to the Pull Request to get the needed information. + +> - TODO: briefly summarize github, discord, and slack discussions (?) +> - also mention Mustafa's prototype and compare both apis briefly (RequestSamples, RespondSamples, ProcessSamplesResponse) +> - mention [ipld experiments] + + + +## Detailed Design + +Add a package to the library that provides the following features: + 1. sample a given number of random row/col indices of extended data square given a DA header and indicate if successful or timeout/other error occurred + 2. store the block in the network by adding it to the peer's local Merkle-DAG whose content is discoverable via a DHT + 3. store the sampled chunks in the network + 4. reconstruct the whole block from a given DA header + 5. get all messages of a particular namespace ID. + +We mention 5. here mostly for completeness. Its details will be described / implemented in a separate ADR / PR. + +Apart from the above mentioned features, we informally collect additional requirements: +- where randomness is needed, the randomness source should be configurable +- all replies by the network should be verified if this is not sufficiently covered by the used libraries already (IPFS) +- where possible, the requests to the network should happen in parallel (without DoSing the proposer for instance). + +This library should be implemented as two new packages: + +First, a sub-package should be added to the layzledger-core [p2p] package +which does not know anything about the core data structures (Block, DA header etc). +It handles the actual network requests to the IPFS network and operates on IPFS/IPLD objects +directly and hence should live under [p2p/ipld]. +To a some extent this part of the stack already exists. + +Second, a high-level API that can "live" closer to the actual types, e.g., in a sub-package in [celestia-core/types] +or in a new sub-package `da`. + +We first describe the high-level library here and describe functions in +more detail inline with their godoc comments below. + +### API that operates on celestia-core types + +As mentioned above this part of the library has knowledge of the core types (and hence depends on them). +It does not deal with IPFS internals. + +```go +// ValidateAvailability implements the protocol described in https://fc21.ifca.ai/papers/83.pdf. +// Specifically all steps of the protocol described in section +// _5.2 Random Sampling and Network Block Recovery_ are carried out. +// +// In more detail it will first create numSamples random unique coordinates. +// Then, it will ask the network for the leaf data corresponding to these coordinates. +// Additionally to the number of requests, the caller can pass in a callback, +// which will be called on for each retrieved leaf with a verified Merkle proof. +// +// Among other use-cases, the callback can be useful to monitoring (progress), or, +// to process the leaf data the moment it was validated. +// The context can be used to provide a timeout. +// TODO: Should there be a constant = lower bound for #samples +func ValidateAvailability( + ctx context.Context, + dah *DataAvailabilityHeader, + numSamples int, + onLeafValidity func(namespace.PrefixedData8), +) error { /* ... */} + +// RetrieveBlockData can be used to recover the block Data. +// It will carry out a similar protocol as described for ValidateAvailability. +// The key difference is that it will sample enough chunks until it can recover the +// full extended data square, including original data (e.g. by using rsmt2d.RepairExtendedDataSquare). +func RetrieveBlockData( + ctx context.Context, + dah *DataAvailabilityHeader, + api coreiface.CoreAPI, + codec rsmt2d.Codec, + ) (types.Data, error) {/* ... */} + +// PutBlock operates directly on the Block. +// It first computes the erasure coding, aka the extended data square. +// Row by row ir calls a lower level library which handles adding the +// the row to the Merkle Dag, in our case a Namespaced Merkle Tree. +// Note, that this method could also fill the DA header. +// The data will be pinned by default. +func (b *Block) PutBlock(ctx context.Context, nodeAdder ipld.NodeAdder) error +``` + +We now describe the lower-level library that will be used by above methods. +Again we provide more details inline in the godoc comments directly. + +`PutBlock` is a method on `Block` as the erasure coding can then be cached, e.g. in a private field +in the block. + +### Changes to the lower level API closer to IPFS (p2p/ipld) + +```go +// GetLeafData takes in a Namespaced Merkle tree root transformed into a Cid +// and the leaf index to retrieve. +// Callers also need to pass in the total number of leaves of that tree. +// Internally, this will be translated to a IPLD path and corresponds to +// an ipfs dag get request, e.g. namespacedCID/0/1/0/0/1. +// The retrieved data should be pinned by default. +func GetLeafData( + ctx context.Context, + rootCid cid.Cid, + leafIndex uint32, + totalLeafs uint32, // this corresponds to the extended square width + api coreiface.CoreAPI, +) ([]byte, error) +``` + +`GetLeafData` can be used by above `ValidateAvailability` and `RetrieveBlock` and +`PutLeaves` by `PutBlock`. + +### A Note on IPFS/IPLD + +In IPFS all data is _content addressed_ which basically means the data is identified by its hash. +Particularly, in the Celestia case, the root CID identifies the Namespaced Merkle tree including all its contents (inner and leaf nodes). +This means that if a `GetLeafData` request succeeds, the retrieved leaf data is in fact the leaf data in the tree. +We do not need to additionally verify Merkle proofs per leaf as this will essentially be done via IPFS on each layer while +resolving and getting to the leaf data. + +> TODO: validate this assumption and link to code that shows how this is done internally + +### Implementation plan + +As fully integrating Data Available proofs into tendermint, is a rather larger change we break up the work into the +following packages (not mentioning the implementation work that was already done): + +1. Flesh out the changes in the consensus messages ([celestia-specs#126], [celestia-specs#127]) +2. Flesh out the changes that would be necessary to replace the current block gossiping ([ADR 001](./adr-001-block-propagation.md)) +3. Add the possibility of storing and retrieving block data (samples or whole block) to celestia-core (this ADR and related PRs). +4. Integrate above API (3.) as an addition into celestia-core without directly replacing the tendermint counterparts (block gossip etc). +5. Rip out each component that will be redundant with above integration in one or even several smaller PRs: + - block gossiping (see ADR 001) + - modify block store (see ADR 001) + - make downloading full Blocks optional (flag/config) + - route some RPC requests to IPFS (see ADR 001) + + +## Status + +Proposed + +## Consequences + +### Positive + +- simplicity & ease of implementation +- can re-use an existing networking and p2p stack (go-ipfs) +- potential support of large, cool, and helpful community +- high-level API definitions independent of the used stack + +### Negative + +- latency +- being connected to the public IPFS network might be overkill if peers should in fact only care about a subset that participates in the Celestia protocol +- dependency on a large code-base with lots of features and options of which we only need a small subset of + +### Neutral +- two different p2p layers exist in celestia-core + +## References + +- https://github.com/celestiaorg/celestia-core/issues/85 +- https://github.com/celestiaorg/celestia-core/issues/167 + +- https://docs.ipld.io/#nodes +- https://arxiv.org/abs/1809.09044 +- https://fc21.ifca.ai/papers/83.pdf +- https://github.com/tendermint/spec/pull/254 + + +[#17]: https://github.com/celestiaorg/celestia-core/pull/17 +[#19]: https://github.com/celestiaorg/celestia-core/pull/19 +[#83]: https://github.com/celestiaorg/celestia-core/pull/83 + +[#152]: https://github.com/celestiaorg/celestia-core/pull/152 + +[celestia-specs#126]: https://github.com/celestiaorg/celestia-specs/issues/126 +[celestia-specs#127]: https://github.com/celestiaorg/celestia-specs/pulls/127 +[Header]: https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/data_structures.md#header + +[go-ipfs documentation]: https://github.com/ipfs/go-ipfs/tree/master/docs/examples/go-ipfs-as-a-library#use-go-ipfs-as-a-library-to-spawn-a-node-and-add-a-file +[ipld experiments]: https://github.com/celestia/ipld-plugin-experiments +[ipld.Nodes]: https://github.com/ipfs/go-ipld-format/blob/d2e09424ddee0d7e696d01143318d32d0fb1ae63/format.go#L22-L45 +[graph-sync]: https://github.com/ipld/specs/blob/master/block-layer/graphsync/graphsync.md +[IPLD selectors]: https://github.com/ipld/specs/blob/master/selectors/selectors.md +[ipld-prime]: https://github.com/ipld/go-ipld-prime + +[rsmt2d]: https://github.com/celestiaorg/rsmt2d + + +[p2p]: https://github.com/celestiaorg/celestia-core/tree/0eccfb24e2aa1bb9c4428e20dd7828c93f300e60/p2p +[p2p/ipld]: https://github.com/celestiaorg/celestia-core/tree/0eccfb24e2aa1bb9c4428e20dd7828c93f300e60/p2p/ipld +[celestia-core/types]: https://github.com/celestiaorg/celestia-core/tree/0eccfb24e2aa1bb9c4428e20dd7828c93f300e60/types diff --git a/docs/celestia-architecture/adr-003-application-data-retrieval.md b/docs/celestia-architecture/adr-003-application-data-retrieval.md new file mode 100644 index 00000000000..689fdfdab24 --- /dev/null +++ b/docs/celestia-architecture/adr-003-application-data-retrieval.md @@ -0,0 +1,141 @@ +# ADR 003: Retrieving Application messages + +## Changelog + +- 2021-04-25: initial draft + +## Context + +This ADR builds on top of [ADR 002](adr-002-ipld-da-sampling.md) and will use the implemented APIs described there. +The reader should familiarize themselves at least with the high-level concepts as well as in the [specs](https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/data_structures.md#2d-reed-solomon-encoding-scheme). + +The academic [paper](https://arxiv.org/abs/1905.09274) describes the motivation and context for this API. +The main motivation can be quoted from section 3.3 of that paper: + +> (Property1) **Application message retrieval partitioning.** Client nodes must be able to download all of the messages relevant to the applications they use [...], without needing to downloading any messages for other applications. + +> (Property2) **Application message retrieval completeness.** When client nodes download messages relevant to the applications they use [...], they must be able to verify that the messages they received are the complete set of messages relevant to their applications, for specific +blocks, and that there are no omitted messages. + + + +The main data structure that enables above properties is called a Namespaced Merkle Tree (NMT), an ordered binary Merkle tree where: +1. each node in the tree includes the range of namespaces of the messages in all descendants of each node +2. leaves in the tree are ordered by the namespace identifiers of the leaf messages + +A more formal description can be found the [specification](https://github.com/celestiaorg/celestia-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/data_structures.md#namespace-merkle-tree). +An implementation can be found in [this repository](https://github.com/celestiaorg/nmt). + +This ADR basically describes version of the [`GetWithProof`](https://github.com/celestiaorg/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L193-L196) of the NMT that leverages the fact that IPFS uses content addressing and that we have implemented an [IPLD plugin](https://github.com/celestiaorg/celestia-core/tree/37502aac69d755c189df37642b87327772f4ac2a/p2p/ipld) for an NMT. + +**Note**: The APIs defined here will be particularly relevant for Optimistic Rollup (full) nodes that want to download their Rollup's data (see [celestiaorg/optimint#48](https://github.com/celestiaorg/optimint/issues/48)). +Another potential use-case of this API could be for so-called [light validator nodes](https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/node_types.md#node-type-definitions) that want to download and replay the state-relevant portion of the block data, i.e. transactions with [reserved namespace IDs](https://github.com/celestiaorg/celestia-specs/blob/master/src/specs/consensus.md#reserved-namespace-ids). + +## Alternative Approaches + +The approach described below will rely on IPFS' block exchange protocol (bitswap) and DHT; IPFS's implementation will be used as a black box to find peers that can serve the requested data. +This will likely be much slower than it potentially could be and for a first implementation we intentionally do not incorporate the optimizations that we could. + +We briefly mention potential optimizations for the future here: +- Use of [graphsync](https://github.com/ipld/specs/blob/5d3a3485c5fe2863d613cd9d6e18f96e5e568d16/block-layer/graphsync/graphsync.md) instead of [bitswap](https://docs.ipfs.io/concepts/bitswap/) and use of [IPLD selectors](https://github.com/ipld/specs/blob/5d3a3485c5fe2863d613cd9d6e18f96e5e568d16/design/history/exploration-reports/2018.10-selectors-design-goals.md) +- expose an API to be able to download application specific data by namespace (including proofs) with the minimal number of round-trips (e.g. finding nodes that expose an RPC endpoint like [`GetWithProof`](https://github.com/celestiaorg/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L193-L196)) + +## Decision + +Most discussions on this particular API happened either on calls or on other non-documented way. +We only describe the decision in this section. + +We decide to implement the simplest approach first. +We first describe the protocol informally here and explain why this fulfils (Property1) and (Property2) in the [Context](#context) section above. + +In the case that leaves with the requested namespace exist, this basically boils down to the following: traverse the tree starting from the root until finding first leaf (start) with the namespace in question, then directly request and download all leaves coming after the start until the namespace changes to a greater than the requested one again. +In the case that no leaves with the requested namespace exist in the tree, we traverse the tree to find the leaf in the position in the tree where the namespace would have been and download the neighbouring leaves. + +This is pretty much what the [`ProveNamespace`](https://github.com/celestiaorg/nmt/blob/ddcc72040149c115f83b2199eafabf3127ae12ac/nmt.go#L132-L146) method does but using IPFS we can simply locate and then request the leaves, and the corresponding inner proof nodes will automatically be downloaded on the way, too. + +## Detailed Design + +We define one function that returns all shares of a block belonging to a requested namespace and block (via the block's data availability header). +See [`ComputeShares`](https://github.com/celestiaorg/celestia-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/block.go#L1371) for reference how encode the block data into namespace shares. + +```go +// RetrieveShares returns all raw data (raw shares) of the passed-in +// namespace ID nID and included in the block with the DataAvailabilityHeader dah. +func RetrieveShares( + ctx context.Context, + nID namespace.ID, + dah *types.DataAvailabilityHeader, + api coreiface.CoreAPI, +) ([][]byte, error) { + // 1. Find the row root(s) that contains the namespace ID nID + // 2. Traverse the corresponding tree(s) according to the + // above informally described algorithm and get the corresponding + // leaves (if any) + // 3. Return all (raw) shares corresponding to the nID +} + +``` + +Additionally, we define two functions that use the first one above to: +1. return all the parsed (non-padding) data with [reserved namespace IDs](https://github.com/celestiaorg/celestia-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/consensus.md#reserved-namespace-ids): transactions, intermediate state roots, evidence. +2. return all application specific blobs (shares) belonging to one namespace ID parsed as a slice of Messages ([specification](https://github.com/celestiaorg/celestia-specs/blob/de5f4f74f56922e9fa735ef79d9e6e6492a2bad1/specs/data_structures.md#message) and [code](https://github.com/celestiaorg/celestia-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/block.go#L1336)). + +The latter two methods might require moving or exporting a few currently unexported functions that (currently) live in [share_merging.go](https://github.com/celestiaorg/celestia-core/blob/1a08b430a8885654b6e020ac588b1080e999170c/types/share_merging.go#L57-L76) and could be implemented in a separate pull request. + +```go +// RetrieveStateRelevantMessages returns all state-relevant transactions +// (transactions, intermediate state roots, and evidence) included in a block +// with the DataAvailabilityHeader dah. +func RetrieveStateRelevantMessages( + ctx context.Context, + nID namespace.ID, + dah *types.DataAvailabilityHeader, + api coreiface.CoreAPI, +) (Txs, IntermediateStateRoots, EvidenceData, error) { + // like RetrieveShares but for all reserved namespaces + // additionally the shares are parsed (merged) into the + // corresponding types in the return arguments +} +``` + +```go +// RetrieveMessages returns all Messages of the passed-in +// namespace ID and included in the block with the DataAvailabilityHeader dah. +func RetrieveMessages( + ctx context.Context, + nID namespace.ID, + dah *types.DataAvailabilityHeader, + api coreiface.CoreAPI, +) (Messages, error) { + // like RetrieveShares but this additionally parsed the shares + // into the Messages type +} +``` + +## Status + +Proposed + +## Consequences + +This API will most likely be used by Rollups too. +We should document it properly and move it together with relevant parts from ADR 002 into a separate go-package. + +### Positive + +- easy to implement with the existing code (see [ADR 002](https://github.com/celestiaorg/celestia-core/blob/47d6c965704e102ae877b2f4e10aeab782d9c648/docs/adr/adr-002-ipld-da-sampling.md#detailed-design)) +- resilient data retrieval via a p2p network +- dependence on a mature and well-tested code-base with a large and welcoming community + +### Negative + +- with IPFS, we inherit the fact that potentially a lot of round-trips are done until the data is fully downloaded; in other words: this could end up way slower than potentially possible +- anyone interacting with that API needs to run an IPFS node + +### Neutral + +- optimizations can happen incrementally once we have an initial working version + +## References + +We've linked to all references throughout the ADR. diff --git a/docs/celestia-architecture/adr-004-mvp-light-client.md b/docs/celestia-architecture/adr-004-mvp-light-client.md new file mode 100644 index 00000000000..cbba0921ba2 --- /dev/null +++ b/docs/celestia-architecture/adr-004-mvp-light-client.md @@ -0,0 +1,292 @@ +# ADR 004: Data Availability Sampling Light Client + +## Changelog + +- 2021-05-03: Initial Draft + +## Context + +We decided to augment the existing [RPC-based Tendermint light client](https://github.com/tendermint/tendermint/blob/bc643b19c48495077e0394d3e21e1d2a52c99548/light/doc.go#L2-L126) by adding the possibility to additionally validate blocks by doing Data Availability Sampling (DAS). +In general, DAS gives light clients assurance that the data behind the block header they validated is actually available in the network and hence, that state fraud proofs could be generated. +See [ADR 002](adr-002-ipld-da-sampling.md) for more context on DAS. + +A great introduction on the Tendermint light client (and light clients in general) can be found in this series of [blog posts](https://medium.com/tendermint/everything-you-need-to-know-about-the-tendermint-light-client-f80d03856f98) as well as this [paper](https://arxiv.org/abs/2010.07031). + +This ADR describes the changes necessary to augment the existing Tendermint light client implementation with DAS from a UX as well as from a protocol perspective. + +## Alternative Approaches + +Ideally, the light client should not just request [signed headers](https://github.com/tendermint/tendermint/blob/bc643b19c48495077e0394d3e21e1d2a52c99548/light/doc.go#L35-L52) from [a few pre-configured peers](https://github.com/tendermint/tendermint/blob/bc643b19c48495077e0394d3e21e1d2a52c99548/light/setup.go#L51-L52) but instead also discover peers from a p2p network. +We will eventually implement this. For more context, we refer to this [issue](https://github.com/celestiaorg/celestia-core/issues/86). +This would require that the (signed) headers are provided via other means than the RPC. +See this [abandoned pull request](https://github.com/tendermint/tendermint/pull/4508) and [issue](https://github.com/tendermint/tendermint/issues/4456) in the Tendermint repository and also this [suggestion](https://github.com/celestiaorg/celestia-core/issues/86#issuecomment-831182564) by [@Wondertan](https://github.com/Wondertan) in this repository. + +For some use-cases—like DAS light validator nodes, or the light clients of a Data Availability Layer that are run by full nodes of an Optimistic Rollup—it would even make sense that the light client (passively) participates in the consensus protocol to some extent; i.e. runs a subset of the consensus reactor to Consensus messages ([Votes](https://github.com/tendermint/tendermint/blob/bc643b19c48495077e0394d3e21e1d2a52c99548/types/vote.go#L48-L59) etc.) come in as early as possible. +Then light clients would not need to wait for the canonical commit to be included in the next [block](https://github.com/tendermint/tendermint/blob/bc643b19c48495077e0394d3e21e1d2a52c99548/types/block.go#L48). + +For the RPC-based light client it could also make sense to add a new RPC endpoint to tendermint for clients to retrieve the [`DataAvailabilityHeader`](https://github.com/celestiaorg/celestia-core/blob/50f722a510dd2ba8e3d31931c9d83132d6318d4b/types/block.go#L52-L69) (DAHeader), or embed the DAHeader. +The [Commit](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/rpc/core/routes.go#L25) only contains the [SignedHeader](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/rpc/core/types/responses.go#L32-L36) (Header and Commit signatures). +Not all light clients will need the full DAHeader though (e.g. super-light-clients do not). + + +## Decision + +For our MVP, we [decide](https://github.com/celestiaorg/celestia-core/issues/307) to only modify the existing RPC-endpoint based light client. +This is mostly because we want to ship our MVP as quickly as possible but independently of this it makes sense to provide a familiar experience for engineers coming from the Cosmos ecosystem. + +We will later implement the above mentioned variants. +How exactly will be described in separate ADRs though. + +## Detailed Design + +From a user perspective very little changes: +the existing light client command gets an additional flag that indicates whether to run DAS or not. +Additionally, the light client operator can decide the number of successful samples to make to deem the block available (and hence valid). + +In case DAS is enabled, the light client will need to: +1. retrieve the DAHeader corresponding to the data root in the Header +2. request a parameterizable number of random samples. + +If the all sampling requests succeed, the whole block is available ([with some high enough probability](https://arxiv.org/abs/1809.09044)). + +### UX + +The main change to the light client [command](https://github.com/celestiaorg/celestia-core/blob/master/cmd/tendermint/commands/light.go#L32-L104) is to add in a new flag to indicate if it should run DAS or not. +Additionally, the user can choose the number of succeeding samples required for a block to be considered available. + +```diff +=================================================================== +diff --git a/cmd/tendermint/commands/light.go b/cmd/tendermint/commands/light.go +--- a/cmd/tendermint/commands/light.go (revision 48b043014f0243edd1e8ebad8cd0564ab9100407) ++++ b/cmd/tendermint/commands/light.go (date 1620546761822) +@@ -64,6 +64,8 @@ + dir string + maxOpenConnections int + ++ daSampling bool ++ numSamples uint32 + sequential bool + trustingPeriod time.Duration + trustedHeight int64 +@@ -101,6 +103,10 @@ + LightCmd.Flags().BoolVar(&sequential, "sequential", false, + "sequential verification. Verify all headers sequentially as opposed to using skipping verification", + ) ++ LightCmd.Flags().BoolVar(&daSampling, "da-sampling", false, ++ "data availability sampling. Verify each header (sequential verification), additionally verify data availability via data availability sampling", ++ ) ++ LightCmd.Flags().Uint32Var(&numSamples, "num-samples", 15, "Number of data availability samples until block data deemed available.") + } +``` + +For the Data Availability sampling, the light client will have to run an IPFS node. +It makes sense to make this mostly opaque to the user as everything around IPFS can be [configured](https://github.com/ipfs/go-ipfs/blob/d6322f485af222e319c893eeac51c44a9859e901/docs/config.md) in the `$IPFS_PATH`. +This IPFS path should simply be a sub-directory inside the light client's [directory](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/cmd/tendermint/commands/light.go#L86-L87). +We can later add the ability to let users configure the IPFS setup more granular. + +**Note:** DAS should only be compatible to sequential verification. +In case a light client is parametrized to run DAS and skipping verification, the CLI should return an easy-to-understand warning or even an error explaining why this does not make sense. + +### Light Client Protocol with DAS + +#### Light Store + +The light client stores data in its own [badgerdb instance](https://github.com/celestiaorg/celestia-core/blob/50f722a510dd2ba8e3d31931c9d83132d6318d4b/cmd/tendermint/commands/light.go#L125) in the given directory: + +```go +db, err := badgerdb.NewDB("light-client-db", dir) +``` + +While it is not critical for this feature, we should at least try to re-use that same DB instance for the local ipld store. +Otherwise, we introduce yet another DB instance; something we want to avoid, especially on the long run (see [#283](https://github.com/celestiaorg/celestia-core/issues/283)). +For the first implementation, it might still be simpler to create a separate DB instance and tackle cleaning this up in a separate pull request, e.g. together with other [instances]([#283](https://github.com/celestiaorg/celestia-core/issues/283)). + +#### RPC + +No changes to the RPC endpoints are absolutely required. +Although, for convenience and ease of use, we should either add the `DAHeader` to the existing [Commit](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/rpc/core/routes.go#L25) endpoint, or, introduce a new endpoint to retrieve the `DAHeader` on demand and for a certain height or block hash. + +The first has the downside that not every light client needs the DAHeader. +The second explicitly reveals to full-nodes which clients are doing DAS and which not. + +**Implementation Note:** The additional (or modified) RPC endpoint could work as a simple first step until we implement downloading the DAHeader from a given data root in the header. +Also, the light client uses a so called [`Provider`](https://github.com/tendermint/tendermint/blob/7f30bc96f014b27fbe74a546ea912740eabdda74/light/provider/provider.go#L9-L26) to retrieve [LightBlocks](https://github.com/tendermint/tendermint/blob/7f30bc96f014b27fbe74a546ea912740eabdda74/types/light.go#L11-L16), i.e. signed headers and validator sets. +Currently, only the [`http` provider](https://github.com/tendermint/tendermint/blob/7f30bc96f014b27fbe74a546ea912740eabdda74/light/provider/http/http.go#L1) is implemented. +Hence, as _a first implementation step_, we should augment the `Provider` and the `LightBlock` to optionally include the DAHeader (details below). +In parallel but in a separate pull request, we add a separate RPC endpoint to download the DAHeader for a certain height. + +#### Store DataAvailabilityHeader + +For full nodes to be able to serve the `DataAvailabilityHeader` without having to recompute it each time, it needs to be stored somewhere. +While this is independent of the concrete serving mechanism, it is more so relevant for the RPC endpoint. +There is ongoing work to make the Tendermint Store only store Headers and the DataAvailabilityHeader in [#218](https://github.com/celestiaorg/celestia-core/pull/218/) / [#182](https://github.com/celestiaorg/celestia-core/issues/182). + +At the time writing this ADR, another pull request ([#312](https://github.com/celestiaorg/celestia-core/pull/312)) is in the works with a more isolated change that adds the `DataAvailabilityHeader` to the `BlockID`. +Hence, the DAHeader is [stored](https://github.com/celestiaorg/celestia-core/blob/50f722a510dd2ba8e3d31931c9d83132d6318d4b/store/store.go#L355-L367) along the [`BlockMeta`](https://github.com/celestiaorg/celestia-core/blob/50f722a510dd2ba8e3d31931c9d83132d6318d4b/types/block_meta.go#L11-L17) there. + +For a first implementation, we could first build on top of #312 and adapt to the changed storage API where only headers and the DAHeader are stored inside tendermint's store (as drafted in #218). +A major downside of storing block data inside of tendermint's store as well as in the IPFS' block store is that is not only redundantly stored data but also IO intense work that will slow down full nodes. + + +#### DAS + +The changes for DAS are very simple from a high-level perspective assuming that the light client has the ability to download the DAHeader along with the required data (signed header + validator set) of a given height: + +Every time the light client validates a retrieved light-block, it additionally starts DAS in the background (once). +For a DAS light client it is important to use [sequential](https://github.com/tendermint/tendermint/blob/f366ae3c875a4f4f61f37f4b39383558ac5a58cc/light/client.go#L46-L53) verification and not [skipping](https://github.com/tendermint/tendermint/blob/f366ae3c875a4f4f61f37f4b39383558ac5a58cc/light/client.go#L55-L69) verification. +Skipping verification only works under the assumption that 2/3+1 of voting power is honest. +The whole point of doing DAS (and state fraud proofs) is to remove that assumption. +See also this related issue in the LL specification: [#159](https://github.com/celestiaorg/celestia-specs/issues/159). + +Independent of the existing implementation, there are three ways this could be implemented: +1. the DAS light client only accepts a header as valid and trusts it after DAS succeeds (additionally to the tendermint verification), and it waits until DAS succeeds (or there was an error or timeout on the way) +2. (aka 1.5) the DAS light client stages headers where the tendermint verification passes as valid and spins up DAS sampling routines in the background; the staged headers are committed as valid iff all routines successfully return in time +3. the DAS light client optimistically accepts a header as valid and trusts it if the regular tendermint verification succeeds; the DAS is run in the background (with potentially much longer timeouts as in 1.) and after the background routine returns (or errs or times out), the already trusted headers are marked as unavailable; this might require rolling back the already trusted headers + +We note that from an implementation point of view 1. is not only the simplest approach, but it would also work best with the currently implemented light client design. +It is the approach that should be implemented first. + +The 2. approach can be seen as an optimization where the higher latency DAS can be conducted in parallel for various heights. +This could speed up catching-up (sequentially) if the light client went offline (shorter than the weak subjectivity time window). + +The 3. approach is the most general of all, but it moves the responsibility to wait or to rollback headers to the caller and hence is undesirable as it offers too much flexibility. + + +#### Data Structures + +##### LightBlock + +As mentioned above the LightBlock should optionally contain the DataAvailabilityHeader. +```diff +Index: types/light.go +=================================================================== +diff --git a/types/light.go b/types/light.go +--- a/types/light.go (revision 64044aa2f2f2266d1476013595aa33bb274ba161) ++++ b/types/light.go (date 1620481205049) +@@ -13,6 +13,9 @@ + type LightBlock struct { + *SignedHeader `json:"signed_header"` + ValidatorSet *ValidatorSet `json:"validator_set"` ++ ++ // DataAvailabilityHeader is only populated for DAS light clients for others it can be nil. ++ DataAvailabilityHeader *DataAvailabilityHeader `json:"data_availability_header"` + } +``` + +Alternatively, we could introduce a `DASLightBlock` that embeds a `LightBlock` and has the `DataAvailabilityHeader` as the only (non-optional) field. +This would be more explicit as it is a new type. +Instead, adding a field to the existing `LightBlock`is backwards compatible and does not require any further code changes; the new type requires `To`- and `FromProto` functions at least. + +##### Provider + +The [`Provider`](https://github.com/tendermint/tendermint/blob/7f30bc96f014b27fbe74a546ea912740eabdda74/light/provider/provider.go#L9-L26) should be changed to additionally provide the `DataAvailabilityHeader` to enable DAS light clients. +Implementations of the interface need to additionally retrieve the `DataAvailabilityHeader` for the [modified LightBlock](#lightblock). +Users of the provider need to indicate this to the provider. + +We could either augment the `LightBlock` method with a flag, add a new method solely for providing the `DataAvailabilityHeader`, or, we could introduce a new method for DAS light clients. + +The latter is preferable because it is the most explicit and clear, and it still keeps places where DAS is not used without any code changes. + +Hence: + +```diff +Index: light/provider/provider.go +=================================================================== +diff --git a/light/provider/provider.go b/light/provider/provider.go +--- a/light/provider/provider.go (revision 7d06ae28196e8765c9747aca9db7d2732f56cfc3) ++++ b/light/provider/provider.go (date 1620298115962) +@@ -21,6 +21,14 @@ + // error is returned. + LightBlock(ctx context.Context, height int64) (*types.LightBlock, error) + ++ // DASLightBlock returns the LightBlock containing the DataAvailabilityHeader. ++ // Other than including the DataAvailabilityHeader it behaves exactly the same ++ // as LightBlock. ++ // ++ // It can be used by DAS light clients. ++ DASLightBlock(ctx context.Context, height int64) (*types.LightBlock, error) ++ ++ + // ReportEvidence reports an evidence of misbehavior. + ReportEvidence(context.Context, types.Evidence) error + } +``` +Alternatively, with the exact same result, we could embed the existing `Provider` into a new interface: e.g. `DASProvider` that adds this method. +This is completely equivalent as above and which approach is better will become more clear when we spent more time on the implementation. + +Regular light clients will call `LightBlock` and DAS light clients will call `DASLightBlock`. +In the first case the result will be the same as for vanilla Tendermint and in the second case the returned `LightBlock` will additionally contain the `DataAvailabilityHeader` of the requested height. + +#### Running an IPFS node + +We already have methods to [initialize](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/cmd/tendermint/commands/init.go#L116-L157) and [run](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/node/node.go#L1449-L1488) an IPFS node in place. +These need to be refactored such that they can effectively be for the light client as well. +This means: +1. these methods need to be exported and available in a place that does not introduce interdependence of go packages +2. users should be able to run a light client with a single command and hence most of the initialization logic should be coupled with creating the actual IPFS node and [made independent](https://github.com/celestiaorg/celestia-core/blob/cbf1f1a4a0472373289a9834b0d33e0918237b7f/cmd/tendermint/commands/init.go#L119-L120) of the `tendermint init` command + +An example for 2. can be found in the IPFS [code](https://github.com/ipfs/go-ipfs/blob/cd72589cfd41a5397bb8fc9765392bca904b596a/cmd/ipfs/daemon.go#L239) itself. +We might want to provide a slightly different default initialization though (see how this is [overridable](https://github.com/ipfs/go-ipfs/blob/cd72589cfd41a5397bb8fc9765392bca904b596a/cmd/ipfs/daemon.go#L164-L165) in the ipfs daemon cmd). + +We note that for operating a fully functional light client the IPFS node could be running in client mode [`dht.ModeClient`](https://github.com/libp2p/go-libp2p-kad-dht/blob/09d923fcf68218181b5cd329bf5199e767bd33c3/dht_options.go#L29-L30) but be actually want light clients to also respond to incoming queries, e.g. from other light clients. +Hence, they should by default run in [`dht.ModeServer`](https://github.com/libp2p/go-libp2p-kad-dht/blob/09d923fcf68218181b5cd329bf5199e767bd33c3/dht_options.go#L31-L32). +In an environment were any bandwidth must be saved, or, were the network conditions do not allow the server mode, we make it easy to change the default behavior. + +##### Client + +We add another [`Option`](https://github.com/tendermint/tendermint/blob/a91680efee3653e3de620f24eb8ddca1c95ce8f9/light/client.go#L43-L117) to the [`Client`](https://github.com/tendermint/tendermint/blob/a91680efee3653e3de620f24eb8ddca1c95ce8f9/light/client.go#L173) that indicates that this client does DAS. + +This option indicates: +1. to do sequential verification and +2. to request [`DASLightBlocks`](#lightblock) from the [provider](#provider). + +All other changes should only affect unexported methods only. + +##### ValidateAvailability + +In order for the light clients to perform DAS to validate availability, they do not need to be aware of the fact that an IPFS node is run. +Instead, we can use the existing [`ValidateAvailability`](https://github.com/celestiaorg/celestia-core/blame/master/p2p/ipld/validate.go#L23-L28) function (as defined in [ADR 002](adr-002-ipld-da-sampling.md) and implemented in [#270](https://github.com/celestiaorg/celestia-core/pull/270)). +Note that this expects an ipfs core API object `CoreAPI` to be passed in. +Using that interface has the major benefit that we could even change the requirement that the light client itself runs the IPFS node without changing most of the validation logic. +E.g., the IPFS node (with our custom IPLD plugin) could run in different process (or machine), and we could still just pass in that same `CoreAPI` interface. + +Orthogonal to this ADR, we also note that we could change all IPFS readonly methods to accept the minimal interface they actually use, namely something that implements `ResolveNode` (and maybe additionally a `NodeGetter`). + +`ValidateAvailability` needs to be called each time a header is validated. +A DAS light client will have to request the `DASLightBlock` for this as per above to be able to pass in a `DataAvailabilityHeader`. + +#### Testing + +Ideally, we add the DAS light client to the existing e2e tests. +It might be worth to catch up with some relevant changes from tendermint upstream. +In particular, [tendermint/tendermint#6196](https://github.com/tendermint/tendermint/pull/6196) and previous changes that it depends on. + +Additionally, we should provide a simple example in the documentation that walks through the DAS light client. +It would be good if the light client logs some (info) output related to DAS to provide feedback to the user. + +## Status + +Proposed + +## Consequences + +### Positive + +- simple to implement and understand +- familiar to tendermint / Cosmos devs +- allows trying out the MVP without relying on the [celestia-app](https://github.com/celestiaorg/celestia-app) (instead a simple abci app like a modified [KVStore](https://github.com/celestiaorg/celestia-core/blob/42e4e8b58ebc58ebd663c114d2bcd7ab045b1c55/abci/example/kvstore/README.md) app could be used to demo the DAS light client) + +### Negative + +- light client does not discover peers +- requires the light client that currently runs simple RPC requests only to run an IPFS node +- rpc makes it extremely easy to infer which light clients are doing DAS and which not +- the initial light client implementation might still be confusing to devs familiar to tendermint/Cosmos for the reason that it does DAS (and state fraud proofs) to get rid of the underlying honest majority assumption, but it will still do all checks related to that same honest majority assumption (e.g. download validator sets, Commits and validate that > 2/3 of them signed the header) + +### Neutral + +DAS light clients need to additionally obtain the DAHeader from the data root in the header to be able to actually do DAS. + +## References + +We have linked all references above inside the text already. diff --git a/docs/celestia-architecture/adr-005-decouple-blockid-and-partsetheader.md b/docs/celestia-architecture/adr-005-decouple-blockid-and-partsetheader.md new file mode 100644 index 00000000000..1bf8fa74164 --- /dev/null +++ b/docs/celestia-architecture/adr-005-decouple-blockid-and-partsetheader.md @@ -0,0 +1,47 @@ +# ADR 005: Decouple the PartSetHeader from the BlockID + +## Changelog + +- 2021-08-01: Initial Draft + +## Context + +Celestia has multiple commits to the block data via the `DataHash` and the `PartSetHeader` in the `BlockID`. As stated in the [#184](https://github.com/celestiaorg/lazyledger-core/issues/184), we no longer need the `PartSetHeader` for this additional commitment to the block's data. However, we are still planning to use the `PartSetHeader` for block propagation during consensus in the short-medium term. This means that we will remove the `PartSetHeader` from as many places as possible, but keep it in the `Proposal` struct. + +## Alternative Approaches + +It’s worth noting that there are proposed changes to remove the `PartSetHeader` entirely, and instead use the already existing commitment to block data, the `DataAvailabilityHeader`, to propagate blocks in parallel during consensus. Discussions regarding the detailed differences entailed in each approach are documented in that ADR's PR. The current direction that is described in this ADR is significantly more conservative in its approach, but it is not strictly an alternative to other designs. This is because other designs would also require removal of the `PartSethHeader`, which is a project in and of itself due to the `BlockID` widespread usage throughout tendermint and the bugs that pop up when attempting to remove it. + +## Decision + +While we build other better designs to experiment with, we will continue to implement the design specified here as it is not orthogonal. https://github.com/celestiaorg/lazyledger-core/pull/434#issuecomment-869158788 + +## Detailed Design + +- [X] Decouple the BlockID and the PartSetHeader [#441](https://github.com/celestiaorg/lazyledger-core/pull/441) +- [ ] Remove the BlockID from every possible struct other than the `Proposal` + - [X] Stop signing over the `PartSetHeader` while voting [#457](https://github.com/celestiaorg/lazyledger-core/pull/457) + - [X] Remove the `PartSetHeader` from the Header [#457](https://github.com/celestiaorg/lazyledger-core/pull/457) + - [X] Remove the `PartSetHeader` from `VoteSetBits`, `VoteSetMaj23`, and `state.State` [#479](https://github.com/celestiaorg/lazyledger-core/pull/479) + - [ ] Remove the `PartSetHeader` from other structs + + +## Status + +Proposed + +### Positive + +- Conservative and easy to implement +- Acts as a stepping stone for other better designs +- Allows us to use 64kb sized chunks, which are well tested + +### Negative + +- Not an ideal design as we still have to include an extra commitment to the block's data in the proposal + +## References + +Alternative ADR [#434](https://github.com/celestiaorg/lazyledger-core/pull/434) +Alternative implementation [#427](https://github.com/celestiaorg/lazyledger-core/pull/427) and [#443](https://github.com/celestiaorg/lazyledger-core/pull/443) +[Comment](https://github.com/celestiaorg/lazyledger-core/pull/434#issuecomment-869158788) that summarizes decision \ No newline at end of file diff --git a/docs/celestia-architecture/adr-006-row-propagation.md b/docs/celestia-architecture/adr-006-row-propagation.md new file mode 100644 index 00000000000..a31686dd314 --- /dev/null +++ b/docs/celestia-architecture/adr-006-row-propagation.md @@ -0,0 +1,202 @@ +# ADR 006: Consensus Block Gossiping with Rows + +## Changelog +* 24.06.2021 - Initial description +* 07.07.2021 - More important details were added +* 18.08.2021 - Mention alternative approaches briefly + +## Context +It's a long story of relations between Celestia, Tendermint, and consensus block gossiping. Celestia's team discussed +multiple ideas, several ADRs were made, and nothing yet was finalized. This ADR is another attempt to bring valuable +changes into block gossiping and hopefully successful. + +Currently, we inherit the following from Tendermint. Our codebase relies on the blocks Parts notion. Each Part is a +piece of an entire serialized block. Those Parts are gossiped between nodes in consensus and committed with +`PartSetHeader` containing a Merkle Root of the Parts. However, Parts gossiping wasn't designed for Celestia blocks. + +Celestia comes with a different block representation from Tendermint. It lays out Blocks as a table of data shares, +where Rows or Columns can be and should be gossiped instead of Parts, keeping only one system-wide commitment to data. + +## Alternative Approaches +### ["nah it works just don't touch it"](https://ahseeit.com//king-include/uploads/2020/11/121269295_375504380484919_2997236194077828589_n-6586327691.jpg) approach + +It turns out that we could fully treat the Tendermint consensus as a black box, keeping two data commitments: one for +consensus with `PartSetHeader` and another for the world outside the consensus with `DAHeader`. + +#### Pros +* Less work + +### Others +* get rid of the PartsHeader from BlockID without changing block propagation at all (see [ADR 005](https://github.com/celestiaorg/celestia-core/blob/58a3901827afbf97852d807de34a2b66f93e0eb6/docs/lazy-adr/adr-005-decouple-blockid-and-partsetheader.md#adr-005-decouple-the-partsetheader-from-the-blockid)) +* change block propagation to fixed-sized chunks but based on the ODS instead of how Parts are built currently (for this we have empirical evidence of how it performs in practice) +* send the block as a whole (only works with smaller blocks) +* block propagation-based on sending the header and Tx-IDs and then requesting the Tx/Messages that are missing from the local mempool of a node on demand + +#### Cons +* Pulls two data commitments to Celestia's specs +* Brings ambiguity to data integrity verification +* Controversial from software design perspective +* Brings DOSing vector for big Blocks. Every Block would need to be represented in two formats in RAM +* Wastes more resources on building and verifying additional + +## Decision +The decision is to still treat Tendermint's consensus as a black box, but with few amendments to gossiping mechanism: +* Introduce `RowSet` that mimics `PartSet`. + + `RowSet` is a helper structure that wraps DAHeader and tracks received Rows with their integrity against DAHeader and + tells its user when the block is complete and/or can be recovered. Mostly it is a helper and is not a high-level + concept. +* Replace `PartSet` with `RowSet` within consensus. +* Keep `DAHeader` in `Proposal` +* Remove `PartSetHeader` from `Proposal` + +The changes above are required to implement the decision. At later point, other changes listed below are +likely to be implemented as a clean-up: +* Entirely removing `PartSetHeader`, as redundant data commitment +* Removing `PartSet` +* Relying on `DAHeader` instead of `PartSetHeader` + +## Detailed Design +The detailed design section demonstrates the design and supporting changes package by package. Fortunately, the +design does not affect any public API and changes are solely internal. + +### `types` +#### RowSet and Row +First and essential part is to implement `RowSet` and `Row`, fully mimicking semantics of `PartSet` and `Part` to +decrease the number of required changes. Below, implementation semantics are presented: + +```go +// Row represents a blob of multiple ExtendedDataSquare shares. +// Practically, it is half of an extended row, as other half can be recomputed. +type Row struct { +// Index is a top-to-bottom index of a Row in ExtendedDataSquare. +// NOTE: Row Index is unnecessary, as we can determine it's Index by hash from DAHeader. However, Index removal +// would bring more changes to Consensus Reactor with arguable pros of less bandwidth usage. +Index int +// The actual share blob. +Data []byte +} + +// NewRow creates new Row from flattened shares and index. +func NewRow(idx int, row [][]byte) *Row + +// RowSet wraps DAHeader and tracks added Rows with their integrity against DAHeader. +// It allows user to check whenever rsmt2d.ExtendedDataSquare can be recovered. +// +// RowSet tracks the whole ExtendedDataSquare, Where Q0 is the original block data: +// ---- ---- +// | Q0 || Q1 | +// ---- ---- +// | Q2 || Q3 | +// ---- ---- +// +// But its AddRow and GetRow methods accepts and returns only half of the Rows - Q0 and Q2. Q1 and Q3 are recomputed. +// ---- +// | Q0 | +// ---- +// | Q2 | +// ---- +// +type RowSet interface { +// NOTE: The RowSet is defined as an interface for simplicity. In practice it should be a struct with one and only +// implementation. + +// AddRow adds a Row to the set. It returns true with nil error in case Row was successfully added. +// The logic for Row is: +// * Check if it was already added +// * Verify its size corresponds to DAHeader +// * Extend it with erasure coding and compute a NMT Root over it +// * Verify that the NMT Root corresponds to DAHeader Root under its Index +// * Finally add it to set and mark as added. +// +AddRow(*Row) (bool, error) + +// GetRow return of a Row by its index, if exist. +GetRow(i int) *Row + +// Square checks if enough rows were added and returns recomputed ExtendedDataSquare if enough +Square() (*rsmt2d.ExtendedDataSquare, error) + +// other helper methods are omitted +} + +// NewRowSet creates full RowSet from rsmt2d.ExtendedDataSquare to gossip it to others through GetRow. +func NewRowSet(eds *rsmt2d.ExtendedDataSquare) *RowSet + +// NewRowSetFromHeader creates empty RowSet from a DAHeader to receive and verify gossiped Rows against the DAHeader +// with AddRow. +func NewRowSetFromHeader(dah *ipld.DataAvailabilityHeader) *RowSet +``` + +#### Vote +`Vote` should include a commitment to data. Previously, it relied on `PartSetHeader` in `BlockId`, instead it relies on +added `DAHeader`. Protobuf schema is updated accordingly. + +#### Proposal +`Proposal` is extended with `NumOriginalDataShares`. This is an optimization that +helps Validators to populate Header without counting original data shares themselves from a block received from a +Proposer. Potentially, that introduce a vulnerability by which a Proposer can send wrong value, leaving the populated +Header of Validators wrong. This part of the decision is optional. + +### `consenSUS` +#### Reactor +##### Messages +The decision affects two messages on consensus reactor: +* `BlockPartMessage` -> `BlockRowMessage` + * Instead of `Part` it carries `Row` defined above. +* `NewValidBlockMessage` + * Instead of `PartSetHeader` it carries `DAHeader` + * `BitArray` of `RowSet` instead of `PartSet` + Protobuf schema for both is updated accordingly. + +##### PeerRoundState +`PeerRoundState` tracks state of each known peer in a round, specifically what commitment it has for a Block and what +chunks peer holds. The decision changes it to track `DAHeader` instead of `PartSetHeader`, along with `BitArray` of +`RowSet` instead of `PartSet`. + +##### BlockCatchup +The Reactor helps its peers to catchup if they go out of sync. Instead of sending random `Part` it now sends random +`Row` by `BlockRowMessage`. Unfortunately, that requires the Reactor to load whole Block from store. As an optimization, +an ability to load Row only from the store could be introduced at later point. + +#### State +##### RoundState +The RoundState keeps Proposal, Valid and Lock Block's data. Along with an entire Block and its Parts, the RoundState +also keeps Rows using `RowSet`. At later point, `PartSet` that tracks part can be removed. + +##### Proposal Stage +Previously, the State in proposal stage waited for all Parts to assemble the entire Block. Instead, the State waits for +the half of all Rows from a proposer and/or peers to recompute the Block's data and notifies them back that no more +needs to be sent. Also, through Rows, only minimally required amount of information is gossiped. Everything else to +assemble the full Block is collected from own chain State and Proposal. + +## Status +Proposed + +## Consequences +### Positive +* Hardening of consensus gossiping with erasure coding +* Blocks exceeding the size limit are immediately rejected on Proposal, without the need to download an entire Block. +* More control over Row message size during consensus, comparing to Part message, as last part of the block always has + unpredictable size. `DAHeader`, on the other hand, allows knowing precisely the size of Row messages. +* Less bandwidth usage + * Only required Block's data is gossiped. + * Merkle proofs of Parts are not sent on the wire +* Only one system-wide block data commitment schema +* We don't abandon the work we were doing for months and taking profits out of it + * PR [#287](https://github.com/celestiaorg/lazyledger-core/pull/287) + * PR [#312](https://github.com/celestiaorg/lazyledger-core/pull/312) + * PR [#427](https://github.com/celestiaorg/lazyledger-core/pull/427) + * and merged others + +### Negative +* We invest some more time(~1.5 weeks). + * Most of the work is done. Only few changes left in the implementation along with peer reviews. + +### Neutral +* Rows vs Parts on the wire + * Previously, parts were propagated with max size of 64KiB. Let's now take a Row of the largest 128x128 block in + comparison. The actual data size in such a case for the Row would be 128x256(shares_per_row*share_size)=32KiB, which + is exactly two times smaller than a Part. +* Gossiped chunks are no longer constant size. Instead, their size is proportional to the size of Block's data. +* Another step back from original Tendermint's codebases diff --git a/docs/celestia-architecture/adr-007-minimal-changes-to-tendermint.md b/docs/celestia-architecture/adr-007-minimal-changes-to-tendermint.md new file mode 100644 index 00000000000..67f07d8a42b --- /dev/null +++ b/docs/celestia-architecture/adr-007-minimal-changes-to-tendermint.md @@ -0,0 +1,237 @@ +# ADR 007: From Ukraine, with Love + +## Changelog + +- 2021-08-20: Initial Description +- 2022-05-03: Update pointing to ADR 008 + +## Context + +Currently, our fork of tendermint includes changes to how to erasure block data, minor changes to the header to commit +to that data, additions to serve data availability sampling, along with some miscellaneous modification to adhere to the +spec. Instead of incorporating all of these changes into our fork of tendermint, we will only make the strictly +necessary changes and the other services and their code to the new celestia-node repo. Notably, we will also refactor +some of the remaining necessary changes to be more isolated from the rest of the tendermint codebase. Both of these +strategies should significantly streamline pulling updates from upstream, and allow us to iterate faster since most +changes will be isolated to celestia-node. + +Update: many of the changes described below have since been minimized or removed. Please see ADR 008 for a summarized list of changes. Notably, we removed intermediate state roots, adopted two new methods from ABCI++ instead of PreprocessTxs, and are still signing over the PartSetHeader. + +## Decision + +Treat tendermint more as a "black box". + +## Detailed Design + +### Overview + +We keep the bare-minimum changes to tendermint in our fork, celestia-core. Where necessary and possible we augment the +tendermint node in a separate process, via celestia-node, which communicates with the tendermint node via RPC. All data +availability sampling logic, including all Celestia-specific networking logic not already provided by tendermint, is +moved into celestia node: + +![core node relation](./img/core-node-relation.jpg) + +The detailed design of celestia-node will be defined in the repository itself. + +### Necessary changes to tendermint + +#### Changing the repo import names to celestiaorg + +- Rebrand (https://github.com/celestiaorg/celestia-core/pull/476) + +#### Changes to the README.md other basic things + +- update github templates (https://github.com/celestiaorg/celestia-core/pull/405) +- update README.md (https://github.com/celestiaorg/celestia-core/pull/10) + +#### Adding the extra types of block data + +- Update core data types (https://github.com/celestiaorg/celestia-core/pull/17) + - Create the Message/Messages types + - Proto and the tendermint version + - Create the IntermediateStateRoots type + - Proto and the tendermint version +- Data availability for evidence (https://github.com/celestiaorg/celestia-core/pull/19) + - Add both types to `types.Data` + - Modify proto + - Add `EvidenceData` to `types.Data` + +#### Add the HeaderHash to the Commit + +- Add header hash to commit(https://github.com/celestiaorg/celestia-core/pull/198) + +#### Adding the consts package in types + +#### Remove iavl as a dependency + +- remove iavl as a dependency (https://github.com/celestiaorg/celestia-core/pull/129) + +#### Using the `DataAvailabilityHeader` to calculate the DataHash + +The `DataAvailabilityHeader` struct will be used by celestia-core as well as by the celestia-node. It might make sense +to (eventually) move the struct together with all the DA-related code into a separate repository and go-module. +@Wondertan explored this as part of [#427](https://github.com/celestiaorg/celestia-core/pull/427#issue-674512464). This +way all client implementations can depend on that module without running into circular dependencies. Hence, we only +describe how to hash the block data here: + +- Update core types (https://github.com/celestiaorg/celestia-core/pull/17) + - Replace the `Data.Hash()` with `DAH.Hash()` + - Use DAH to fill DataHash when filling the header + - Fill the DAH when making a block to generate the data hash + +#### Add availableDataOriginalSharesUsed to the header + +- Add availableDataOriginalSharesUsed to the header (https://github.com/celestiaorg/celestia-core/pull/262) + +#### Reap some number of transactions probably using the app or some other mech + +- Enforce a minimum square size (https://github.com/celestiaorg/celestia-core/pull/282) +- Use squares with a width that is a power of two(https://github.com/celestiaorg/celestia-core/pull/331) +- Adopt reaping from the mempool to max square size (https://github.com/celestiaorg/celestia-core/issues/77) +- Proposal: Decide on a mech to pick square size and communicate that to the + app (https://github.com/celestiaorg/celestia-core/issues/454) +- Also see ABCI++ for a less hacky solution + +#### Filling the DAH using share merging and splitting + +- Compute Shares (not merged) (https://github.com/celestiaorg/celestia-core/pull/60) + - part II (not merged) (https://github.com/celestiaorg/celestia-core/pull/63) + - while this was not merged, we will need some function to compute the shares that make up the block data +- Share Splitting (https://github.com/celestiaorg/celestia-core/pull/246) + - Serialize each constituent of block data + - Split into shares + - Txs (contiguous) + - Messages (not contiguous) + - Evidence (contiguous) + - IntermediateStateRoots (contiguous) +- Combine shares into original square +- ExtendBlockData +- Generate nmt root of each row and col +- Use those roots to generate the DataHash +- Share Merging (https://github.com/celestiaorg/celestia-core/pull/261) + - Sort by namespace + - Parse each reserved type + - Parse remaining messages + +#### Add the wrapper around nmt to erasure namespaces + +- Implement rsmt tree wrapper for nmt (https://github.com/celestiaorg/celestia-core/pull/238) + +#### Add PreprocessTxs to ABCI + +- Add PreprocessTxs method to ABCI (https://github.com/celestiaorg/celestia-core/pull/110) +- Add method to ABCI interface +- Create sync and async versions +- Add sync version to the CreateProposalBlock method of BlockExecutor + +#### Fill the DAH while making the block + +- Basic DA functionality (https://github.com/celestiaorg/celestia-core/pull/83) + +#### Only produce blocks on some interval + +- Control block times (https://github.com/tendermint/tendermint/issues/5911) + +#### Stop signing over the PartSetHeader + +- Replace canonical blockID with just a hash in the CononicalVote +- Replace the LastBlockID in the header with just a hash + +#### Optionally remove some unused code + +- Removing misc unsued code (https://github.com/celestiaorg/celestia-core/pull/208) +- Remove docs deployment (https://github.com/celestiaorg/celestia-core/pull/134) +- Start deleting docs (https://github.com/celestiaorg/celestia-core/pull/209) +- Remove tendermint-db in favor of badgerdb (https://github.com/celestiaorg/celestia-core/pull/241) +- Delete blockchain 2 until further notice (https://github.com/celestiaorg/celestia-core/pull/309) +- We don’t need to support using out of process apps + +#### Nice to Haves + +- More efficient hashing (https://github.com/celestiaorg/celestia-core/pull/351) + +We should also take this opportunity to refactor as many additions to tendermint into their own package as possible. +This will hopefully make updating to future versions of tendermint easier. For example, when we fill the data +availability header, instead of using a method on `Block`, it could be handled by a function that takes `types.Data` as +input and returns the DAH, the number of shares used in the square, along with the obligatory error. + +```go +func FillDataAvailabilityHeader(data types.Data) (types.DataAvailabilityHeader, numOrigDataShares, error) +``` + +We could perform a similar treatment to the `splitIntoShares` methods and their helper method `ComputeShares`. Instead +of performing the share splitting logic in those methods, we could keep it in a different package and instead call the +equivalent function to compute the shares. + +Beyond refactoring and some minor additions, we will also have to remove and revert quite a few changes to get to the +minimum desired changes specified above. + +### Changes that will need to be reverted + +#### IPLD Plugin + +- Introduction (https://github.com/celestiaorg/celestia-core/pull/144) +- Initial integration (https://github.com/celestiaorg/celestia-core/pull/152) +- Custom Multihash (https://github.com/celestiaorg/celestia-core/pull/155) +- Puting data during proposal (https://github.com/celestiaorg/celestia-core/pull/178) +- Module name (https://github.com/celestiaorg/celestia-core/pull/151) +- Update rsmt2d (https://github.com/celestiaorg/celestia-core/pull/290) +- Make plugin a package (https://github.com/celestiaorg/celestia-core/pull/294) + +#### Adding DAH to Stuff + +- Adding DAH to Proposal (https://github.com/celestiaorg/celestia-core/pull/248/files) +- Blockmeta (https://github.com/celestiaorg/celestia-core/pull/372) + +#### Embedding DAS + +- GetLeafData (https://github.com/celestiaorg/celestia-core/pull/212) +- RetrieveBlockData (https://github.com/celestiaorg/celestia-core/pull/232) +- ValidateAvailability (https://github.com/celestiaorg/celestia-core/pull/270) +- Prevent double writes to IPFS (https://github.com/celestiaorg/celestia-core/pull/271) +- Stop Pinning (https://github.com/celestiaorg/celestia-core/pull/276) +- Rework IPFS Node (https://github.com/celestiaorg/celestia-core/pull/334) +- Refactor for putting the block (https://github.com/celestiaorg/celestia-core/pull/338) +- Config for IPFS node (https://github.com/celestiaorg/celestia-core/pull/340) +- IPLD Dag instead of CoreAPI (https://github.com/celestiaorg/celestia-core/pull/352) +- Adding the DAG to the blockstore (https://github.com/celestiaorg/celestia-core/pull/356) +- Saving and Loading using IPFS (https://github.com/celestiaorg/celestia-core/pull/374) +- Manual Providing (https://github.com/celestiaorg/celestia-core/pull/375) +- Refactor node provider (https://github.com/celestiaorg/celestia-core/pull/400) +- DAS in light client workaround (https://github.com/celestiaorg/celestia-core/pull/413) + +#### BlockID and PartSetHeader + +- Decouple ParSetHeader from BlockID (https://github.com/celestiaorg/celestia-core/pull/441) +- Stop Signing over the PartSetHeader (https://github.com/celestiaorg/celestia-core/pull/457) +- We still don’t want to sign over the PartSetHeader, but we will not be able to use the same mechanism used in the + linked PR, as that way requires decoupling of the PSH from the BlockID +- Remove PSH from some consensus messages (https://github.com/celestiaorg/celestia-core/pull/479) + +Note: This ADR overrides ADR 005 Decouple BlockID and the PartSetHeader. The PartSetHeader and the BlockID will mostly +remain the same. This will make pulling changes from upstream much easier + +## Status + +Accepted + +## Consequences + +### Positive + +- Pulling changes from upstream is streamlined +- Separation of functionality will help us iterate faster +- Creates a great opportunity for reconsidering past design choices without fully starting from scratch +- Prepare for future designs +- Don’t have to have two p2p stacks in a single repo + +### Negative + +- Perform some computation multiple times +- Running multiple nodes instead of a single node is less convenient for node operators (but only in the case the full + celestia-node wants to participate in the consensus protocol) + +## References + +Tracking Issue #491 diff --git a/docs/celestia-architecture/adr-008-updating-to-tendermint-v0.35.x.md b/docs/celestia-architecture/adr-008-updating-to-tendermint-v0.35.x.md new file mode 100644 index 00000000000..276358c418d --- /dev/null +++ b/docs/celestia-architecture/adr-008-updating-to-tendermint-v0.35.x.md @@ -0,0 +1,53 @@ +# ADR 008: Updating to tendermint v0.35.x + +## Changelog + +- 2022-05-03: Initial document describing changes to tendermint v0.35.x + +## Context + +Building off of ADR 007, we have further distilled the necessary changes to tendermint and continued to move added logic to other repos. Specifically, we have moved generation of the data hash, efficient construction of the data square, and a message inclusion check to celestia-app via adopting two new methods from ABCI++. This document is to serve as a guide for the remaining changes made on top of tendermint v0.35.4. + +### Changes to tendermint + +#### Misc + +- [update github templates](https://github.com/celestiaorg/celestia-core/pull/405) +- [update README.md](https://github.com/celestiaorg/celestia-core/pull/737/commits/be9039d4e0f5d876ec3d8d4521be3374172d7992) +- [updating to go 1.17](https://github.com/celestiaorg/celestia-core/pull/737/commits/6094b7338082d106f81da987dffa56eb540a675e) +- [adding the consts package](https://github.com/celestiaorg/celestia-core/pull/737/commits/fea8528b0177230b7e75396ae05f7a9b5da23741) + +#### Changing the way the data hash is calculated + +To enable data availability sampling, Celestia uses a proprietary data square format to encode its block data. The data hash is generated from this data square by calculating namespace merkle tree root over each row and column. In the following changes, we implement encoding and decoding of block data to the data square format and tooling to generate the data hash. More details over this design can be found in our (archived but still very useful) [specs repo](https://github.com/celestiaorg/celestia-specs) + +- [Adding the Data Availability Header](https://github.com/celestiaorg/celestia-core/pull/737/commits/116b7af4000920030a373363487ef9a9f084e066) +- [Adding a wrapper for namespaced merkle trees](https://github.com/celestiaorg/celestia-core/pull/737/commits/eee8f352cb6a1687a9f6b470abe28bbd4eb66413) +- [Adding Messages and Evidence to the block data](https://github.com/celestiaorg/celestia-core/pull/737/commits/86df6529a7c0cc1112c34b6bf1b5364aa0518dec) +- [Adding share splitting and merging for block encoding](https://github.com/celestiaorg/celestia-core/pull/737/commits/bf2d8b46c1caf1fed52e7db9bf8aa6a9847d84ab) +- [Modifying MakeBlock to also accept Messages](https://github.com/celestiaorg/celestia-core/pull/737/commits/bb970a417356ab030c934ccd2bd39c9641af45f8) + +#### Adding PrepareProposal and ProcessProposal ABCI methods from ABCI++ + +- [PrepareProposal](https://github.com/celestiaorg/celestia-core/pull/737/commits/07f9a05444db763c44ff81f564e7350ddf57e5a4) +- [ProcessProposal](https://github.com/celestiaorg/celestia-core/pull/737/commits/2c9552db09841f2bbebc1ec34653b2441def9f13) + +more details on how we use these new methods in the app can be found in the [ABCI++ Adoption ADR](https://github.com/celestiaorg/celestia-app/blob/master/docs/architecture/ADR-001-ABCI%2B%2B.md). + +#### Wrapping Malleated Transactions + +Tendermint and the cosmos-sdk were not built to handle malleated transactions (txs that are submitted by the user, but modified by the block producer before being included in a block). While not a final solution, we have resorted to adding the hash of the original transaction (the one that is not modified by the block producer) to the modified one. This allows us to track the transaction in the event system and mempool. + +- [Index malleated Txs](https://github.com/celestiaorg/celestia-core/pull/737/commits/a54e3599a5ef6b2ba8b63f586aed8185a3f59e4d) + +#### Create NMT Inclusion Proofs for Transactions + +Since the block data that is committed over is encoded as a data square and we use namespaced merkle trees to generate the row and column roots of that square, we have to create transaction inclusion proofs also using nmts and a data square. The problem is that the block data isn't stored as a square, so in order to generate the inclusion proofs, we have to regenerate a portion of the square. We do that here. + +- [Create namespace merkle tree inclusion proofs for transactions included in the block](https://github.com/celestiaorg/celestia-core/pull/737/commits/01051aa5fef0693bf3bda801e39c80e5746b9c33) + +#### Adding the DataCommitment RPC endpoint + +This RPC endpoint is used by quantum gravity bridge orchestrators to create a commitment over the block data of a range of blocks. + +- [Adding the DataCommitment RPC endpoint](https://github.com/celestiaorg/celestia-core/pull/737/commits/134eeefb7af41afe760d4adc5b22a9d55e36bc3e) \ No newline at end of file diff --git a/docs/celestia-architecture/adr-009-cat-pool.md b/docs/celestia-architecture/adr-009-cat-pool.md new file mode 100644 index 00000000000..44772e152f0 --- /dev/null +++ b/docs/celestia-architecture/adr-009-cat-pool.md @@ -0,0 +1,96 @@ +# ADR 009: Content addressable transaction pool + +## Changelog + +- 2023-01-11: Initial Draft (@cmwaters) + +## Context + +One of the criteria of success for Celestia as a reliable data availability layer is the ability to handle large transactional throughput. A component that plays a significant role in this is the mempool. It's purpose is to receive transactions from clients and broadcast them to all other nodes, eventually reaching the next block proposer who includes it in their block. Given Celestia's aggregator-like role whereby larger transactions, i.e. blobs, are expected to dominate network traffic, a content-addressable algorithm, common in many other [peer-to-peer file sharing protocols](https://en.wikipedia.org/wiki/InterPlanetary_File_System), could be far more beneficial than the current transaction-flooding protocol that Tendermint currently uses. + +This ADR describes the content addressable transaction protocol and through a comparative analysis with the existing gossip protocol, presents the case for it's adoption in Celestia. + +## Decision + +Use a content addressable transaction pool for disseminating transaction to nodes within the Celestia Network + +## Detailed Design + +The core idea is that each transaction can be referenced by a key, generated through a cryptographic hash function that reflects the content of the transaction. Nodes signal to one another which transactions they have via this key, and can request transactions they are missing through the key. This reduces the amount of duplicated transmission compared to a system which blindly sends received transactions to all other connected peers (as we will see in the consequences section). + +Full details on the exact protocol can be found in the [spec](../../mempool/cat/spec.md). Here, the document focuses on the main deciding points around the architecture: + +- It is assumed clients submit transactions to a single node in the network. Thus a node that receives a transaction through RPC will immediately broadcast it to all connected peers. +- The new messages: `SeenTx` and `WantTx`, are broadcast over a new mempool channel `byte(0x31)` for backwards compatibility and to distinguish priorities. Nodes running the other mempools will not receive these messages and will be able to operate normally. Similarly, the interfaces used by Tendermint are not modified in any way, thus a node operator can easily switch between mempool versions. +- Transaction gossiping takes priority over these "state" messages as to avoid situations where we receive a `SeenTx` and respond with a `WantTx` while the transaction is still queued in the nodes p2p buffer. +- The node only sends `SeenTx` to nodes that haven't yet seen the transaction, using jitter (with an upper bound of 100ms) to stagger when `SeenTx`s are broadcast to avoid messages being sent at once. +- `WantTx`s are sent to one peer at a time. A timeout is used to deem when a peer is unresponsive and the `WantTx` should be sent to another peer. This is currently set to 200ms (an estimation of network round trip time). It is not yet configurable but we may want to change that in the future. +- A channel has been added to allow the `TxPool` to feed validated txs to the `Reactor` to be sent to all other peers. + +A series of new metrics have been added to monitor effectiveness: + +- SuccessfulTxs: number of transactions committed in a block (to be used as a baseline) +- AlreadySeenTxs: transactions that are received more than once +- RequestedTxs: the number of initial requests for a transaction +- RerequestedTxs: the number of follow up requests for a transaction. If this is high, it may indicate that the request timeout is too short. + +The CAT pool has had numerous unit tests added. It has been tested in the local e2e networks and put under strain in large, geographically dispersed 100 node networks. + +## Alternative Approaches + +A few variations on the design were prototyped and tested. An early implementation experimented with just `SeenTx`s. All nodes would gossip `SeenTx` upon receiving a valid tx. Nodes would not relay received transactions to peers that had sent them a `SeenTx`. However, in many cases this would lead to a node sending a tx to a peer before it was able to receive the `SeenTx` that the node had just sent. Even with a higher priority, a large amount of duplication still occurred. + +Another trick was tested which involved adding a `From` field to the `SeenTx`. Nodes receiving the `SeenTx` would use the `NodeID` in `From` to check if they were already connected to that peer and thus could expect a transaction from them soon instead of immediately issuing a `WantTx`. In large scale tests, this proved to be surprisingly less efficient. This might be because a `SeenTx` rarely arrives from another node before the initial sender has broadcast to everyone. It may also be because in the testnets, each node was only connected to 10 other nodes, decreasing the chance that the node was actually connected to the original sender. The `From` field also added an extra 40 bytes to the `SeenTx` message. In the chart below, this experiment is shown as CAT2. + +## Status + +Implemented + +## Consequences + +To validate its effectiveness, the protocol was benchmarked against existing mempool implementations. This was done under close-to-real network environments which used [testground](https://github.com/testground/testground) and the celestia-app binary (@ v0.11.0) to create 100 validator networks. The network would then be subjected to PFBs from 1100 light nodes at 4kb per transaction. The network followed 15 second blocktimes with a maximum block size of roughly 8MB (these were being filled). This was run for 10 minutes before being torn down. The collected data was aggregated across the 100 nodes and is as follows: + +| Version | Average Bandwidth | Standard Deviation | Finalized Bandwidth | +|-----|-----|------|------| +| v0 | 982.66MB/s | 113.91MB/s | 11MB/s | +| v1 | 999.89MB/s | 133.24MB/s | 11MB/s | +| v2 (CAT) | 98.90MB/s | 18.95MB/s | 11MB/s | +| v2 (CAT2) | 110.28MB/s | 33.49MB/s | 11MB/s | + +> Finalized bandwidth is the amount of bytes finalized by consensus per second whereas the other measurements are per node. + +Rather than just expressing the difference in bytes, this can also be viewed by the factor of duplication (i.e. the amount of times a transaction is received by a node) + +| Version | v0 | v1 | v2 (CAT) | v2 (CAT2) | +| --------|----|----|----------|-----------| +| Duplication | 17.61x | 17.21x | 1.75x | 1.85x | + + +This, of course, comes at the cost of additional message overhead and there comes a point where the transactions are small enough that the reduction in duplication doesn't outweigh the extra state messages. + + +### Positive + +- Reduction in network bandwidth. +- Cross compatible and therefore easily reversible. + +### Negative + +- Extra network round trip when not directly receiving a transaction. +- Greater complexity than a simple flooding mechanism. + +### Neutral + +- Allows for compact blocks to be implemented as it depends on the push pull functionality. + +## Ongoing work + +This section describes further work that may be subsequently undertaken in this area. The first is transaction bundling. If a node is subject to a lot of transactions from clients, instead of sending them off immediately one-by-one, it may wait for a fixed period (~100ms) and bundle them all together. The set of transactions can now be represented as a single key. This increases the content to key ratio and thus improves the performance of the protocol. + +An area of further exploration is the concept of neighborhoods. Variations of this idea are present in both [GossipSub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md#gossipsub-the-gossiping-mesh-router) and Solana's Turbine. The concept entails shaping the network typology into many neighborhoods or sections where a node can be seen as strongly connected to nodes in their neighbourhood and weakly connected to peers in other neighborhoods. The idea behind a more structured topology is to make the broadcasting more directed. + +Outside of protocol development, work can be done to more accurately measure the performance. Both protocols managed to sustain 15 second block times with mostly full blocks i.e. same output throughput. This indicates that the network was being artificially constrained. Either of these constraints need to be lifted (ideally max square size) so we are able to measure the underlying network speed. + +## References + +- [Content-addressable transaction pool spec](../../mempool/cat/spec.md) diff --git a/docs/celestia-architecture/adr-template.md b/docs/celestia-architecture/adr-template.md new file mode 100644 index 00000000000..adbac8f4d4b --- /dev/null +++ b/docs/celestia-architecture/adr-template.md @@ -0,0 +1,72 @@ +# ADR {ADR-NUMBER}: {TITLE} + +## Changelog + +- {date}: {changelog} + +## Context + +> This section contains all the context one needs to understand the current state, and why there is a problem. It should be as succinct as possible and introduce the high level idea behind the solution. + +## Alternative Approaches + +> This section contains information around alternative options that are considered before making a decision. It should contain an explanation on why the alternative approach(es) were not chosen. + +## Decision + +> This section records the decision that was made. +> It is best to record as much info as possible from the discussion that happened. This aids in not having to go back to the Pull Request to get the needed information. + +## Detailed Design + +> This section does not need to be filled in at the start of the ADR, but must be completed prior to the merging of the implementation. +> +> Here are some common questions that get answered as part of the detailed design: +> +> - What are the user requirements? +> +> - What systems will be affected? +> +> - What new data structures are needed, what data structures will be changed? +> +> - What new APIs will be needed, what APIs will be changed? +> +> - What are the efficiency considerations (time/space)? +> +> - What are the expected access patterns (load/throughput)? +> +> - Are there any logging, monitoring or observability needs? +> +> - Are there any security considerations? +> +> - Are there any privacy considerations? +> +> - How will the changes be tested? +> +> - If the change is large, how will the changes be broken up for ease of review? +> +> - Will these changes require a breaking (major) release? +> +> - Does this change require coordination with the Celestia fork of the SDK or celestia-app? + +## Status + +> A decision may be "proposed" if it hasn't been agreed upon yet, or "accepted" once it is agreed upon. Once the ADR has been implemented mark the ADR as "implemented". If a later ADR changes or reverses a decision, it may be marked as "deprecated" or "superseded" with a reference to its replacement. + +{Deprecated|Proposed|Accepted|Declined} + +## Consequences + +> This section describes the consequences, after applying the decision. All consequences should be summarized here, not just the "positive" ones. + +### Positive + +### Negative + +### Neutral + +## References + +> Are there any relevant PR comments, issues that led up to this, or articles referenced for why we made the given design choice? If so link them here! + +- {reference link} diff --git a/docs/celestia-architecture/assets/user-request.png b/docs/celestia-architecture/assets/user-request.png new file mode 100644 index 00000000000..3d04fad7349 Binary files /dev/null and b/docs/celestia-architecture/assets/user-request.png differ diff --git a/docs/celestia-architecture/img/core-node-relation.jpg b/docs/celestia-architecture/img/core-node-relation.jpg new file mode 100644 index 00000000000..8c93640633c Binary files /dev/null and b/docs/celestia-architecture/img/core-node-relation.jpg differ diff --git a/docs/celestia-architecture/img/extended_square.png b/docs/celestia-architecture/img/extended_square.png new file mode 100644 index 00000000000..8bbf4695053 Binary files /dev/null and b/docs/celestia-architecture/img/extended_square.png differ diff --git a/libs/trace/README.md b/libs/trace/README.md new file mode 100644 index 00000000000..09baeb9039a --- /dev/null +++ b/libs/trace/README.md @@ -0,0 +1,102 @@ +# trace package + +The `trace` package provides a decently fast way to store traces locally. + +## Usage + +To enable the local tracer, add the following to the config.toml file: + +```toml +# The tracer to use for collecting trace data. +trace_type = "local" + +# The size of the batches that are sent to the database. +trace_push_batch_size = 1000 + +# The list of tables that are updated when tracing. All available tables and +# their schema can be found in the pkg/trace/schema package. It is represented as a +# comma separated string. For example: "consensus_round_state,mempool_tx". +tracing_tables = "consensus_round_state,mempool_tx" +``` + +Trace data will now be stored to the `.celestia-app/data/traces` directory, and +save the file to the specified directory in the `table_name.jsonl` format. + +To read the contents of the file, open it and pass it the Decode function. This +returns all of the events in that file as a slice. + +```go +events, err := DecodeFile[schema.MempoolTx](file) +if err != nil { + return err +} +``` + +### Pull Based Event Collection + +Pull based event collection is where external servers connect to and pull trace +data from the consensus node. + +To use this, change the config.toml to store traces in the +.celestia-app/data/traces directory. + +```toml +# The tracer pull address specifies which address will be used for pull based +# event collection. If empty, the pull based server will not be started. +trace_pull_address = ":26661" +``` + +To retrieve a table remotely using the pull based server, call the following +function: + +```go +err := GetTable("http://1.2.3.4:26661", "mempool_tx", "directory to store the file") +if err != nil { + return err +} +``` + +This stores the data locally in the specified directory. + + +### Push Based Event Collection + +Push based event collection is where the consensus node pushes trace data to an +external server. At the moment, this is just an S3 bucket. To use this, two options are available: +#### Using push config file + +Add the following to the config.toml file: + +```toml +# TracePushConfig is the relative path of the push config. +# This second config contains credentials for where and how often to +# push trace data to. For example, if the config is next to this config, +# it would be "push_config.json". +trace_push_config = "{{ .Instrumentation.TracePushConfig }}" +``` + +The push config file is a JSON file that should look like this: + +```json +{ + "bucket": "bucket-name", + "region": "region", + "access_key": "", + "secret_key": "", + "push_delay": 60 // number of seconds to wait between intervals of pushing all files +} +``` + +#### Using environment variables for s3 bucket + +Alternatively, you can set the following environment variables: + +```bash +export TRACE_PUSH_BUCKET_NAME=bucket-name +export TRACE_PUSH_REGION=region +export TRACE_PUSH_ACCESS_KEY=access-key +export TRACE_PUSH_SECRET_KEY=secret-key +export TRACE_PUSH_DELAY=push-delay +``` + +`bucket_name` , `region`, `access_key`, `secret_key` and `push_delay` are the s3 bucket name, region, access key, secret key and the delay between pushes respectively. diff --git a/mempool/cat/README.md b/mempool/cat/README.md new file mode 100644 index 00000000000..fec54796aa1 --- /dev/null +++ b/mempool/cat/README.md @@ -0,0 +1,101 @@ +# Content Addressable Transaction Pool Specification + +- 01.12.2022 | Initial specification (@cmwaters) +- 09.12.2022 | Add Push/Pull mechanics (@cmwaters) + +### Outline + +This document specifies the properties, design and implementation of a content addressable transaction pool (CAT). This protocol is intended as an alternative to the FIFO and Priority mempools currently built-in to the Tendermint consensus protocol. The term content-addressable here, indicates that each transaction is identified by a smaller, unique tag (in this case a sha256 hash). These tags are broadcast among the transactions as a means of more compactly indicating which peers have which transactions. Tracking what each peer has aims at reducing the amount of duplication. In a network without content tracking, a peer may receive as many duplicate transactions as peers connected to. The tradeoff here therefore is that the transactions are significantly larger than the tag such that the sum of the data saved sending what would be duplicated transactions is larger than the sum of sending each peer a tag. + +### Purpose + +The objective of such a protocol is to transport transactions from the author (usually a client) to a proposed block, optimizing both latency and throughput i.e. how quickly can a transaction be proposed (and committed) and how many transactions can be transported into a block at once. + +Typically the mempool serves to receive inbound transactions via an RPC endpoint, gossip them to all nodes in the network (regardless of whether they are capable of proposing a block or not), and stage groups of transactions to both consensus and the application to be included in a block. + +### Assumptions + +The following are assumptions inherited from existing Tendermint mempool protocols: + +- `CheckTx` should be seen as a simple gatekeeper to what transactions enter the pool to be gossiped and staged. It is non-deterministic: one node may reject a transaction that another node keeps. +- Applications implementing `CheckTx` are responsible for replay protection (i.e. the same transaction being present in multiple blocks). The mempool ensures that within the same block, no duplicate transactions can exist. +- The underlying p2p layer guarantees eventually reliable broadcast. A transaction need only be sent once to eventually reach the target peer. + +### Messages + +The CAT protocol extends on the existing mempool implementations by introducing two new protobuf messages: + +```protobuf +message SeenTx { + bytes tx_key = 1; + optional string from = 2; +} + +message WantTx { + bytes tx_key = 1; +} +``` + +Both `SeenTx` and `WantTx` contain the sha256 hash of the raw transaction bytes. `SeenTx` also contains an optional `p2p.ID` that corresponds to the peer that the node received the tx from. The only validation for both is that the byte slice of the `tx_key` MUST have a length of 32. + +Both messages are sent across a new channel with the ID: `byte(0x31)`. This enables cross compatibility as discussed in greater detail below. + +> **Note:** +> The term `SeenTx` is used over the more common `HasTx` because the transaction pool contains sophisticated eviction logic. TTL's, higher priority transactions and reCheckTx may mean that a transaction pool *had* a transaction but does not have it any more. Semantically it's more appropriate to use `SeenTx` to imply not the presence of a transaction but that the node has seen it and dealt with it accordingly. + +### Outbound logic + +A node in the protocol has two distinct modes: "broadcast" and "request/response". When a node receives a transaction via RPC (or specifically through `CheckTx`), it assumed that it is the only recipient from that client and thus will immediately send that transaction, after validation, to all connected peers. Afterwards, only "request/response" is used to disseminate that transaction to everyone else. + +> **Note:** +> Given that one can configure a mempool to switch off broadcast, there are no guarantees when a client submits a transaction via RPC and no error is returned that it will find its way into a proposers transaction pool. + +A `SeenTx` is broadcasted to ALL nodes upon receiving a "new" transaction from a peer. The transaction pool does not need to track every unique inbound transaction, therefore "new" is identified as: + +- The node does not currently have the transaction +- The node did not recently reject the transaction or has recently seen the same transaction committed (subject to the size of the cache) +- The node did not recently evict the transaction (subject to the size of the cache) + +Given this criteria, it is feasible, yet unlikely that a node receives two `SeenTx` messages from the same peer for the same transaction. + +A `SeenTx` MAY be sent for each transaction currently in the transaction pool when a connection with a peer is first established. This acts as a mechanism for syncing pool state across peers. + +The `SeenTx` message MUST only be broadcasted after validation and storage. Although it is possible that a node later drops a transaction under load shedding, a `SeenTx` should give as strong guarantees as possible that the node can be relied upon by others that don't yet have the transaction to obtain it. + +> **Note:** +> Inbound transactions submitted via the RPC do not trigger a `SeenTx` message as it is assumed that the node is the first to see the transaction and by gossiping it to others it is implied that the node has seen the transaction. + +A `WantTx` message is always sent point to point and never broadcasted. A `WantTx` MUST only be sent after receiving a `SeenTx` message from that peer. There is one exception which is that a `WantTx` MAY also be sent by a node after receiving an identical `WantTx` message from a peer that had previously received the nodes `SeenTx` but which after the lapse in time, did no longer exist in the nodes transaction pool. This provides an optional synchronous method for communicating that a node no longer has a transaction rather than relying on the defaulted asynchronous approach which is to wait for a period of time and try again with a new peer. + +`WantTx` must be tracked. A node SHOULD not send multiple `WantTx`s to multiple peers for the same transaction at once but wait for a period that matches the expected network latency before rerequesting the transaction to another peer. + +### Inbound logic + +Transaction pools are solely run in-memory; thus when a node stops, all transactions are discarded. To avoid the scenario where a node restarts and does not receive transactions because other nodes recorded a `SeenTx` message from their previous run, each transaction pool should track peer state based **per connection** and not per `NodeID`. + +Upon receiving a `Txs` message: + +- Check whether it is in response to a request or simply an unsolicited broadcast +- Validate the tx against current resources and the applications `CheckTx` +- If rejected or evicted, mark accordingly +- If successful, send a `SeenTx` message to all connected peers excluding the original sender. If it was from an initial broadcast, the `SeenTx` should populate the `From` field with the `p2p.ID` of the recipient else if it is in response to a request `From` should remain empty. + +Upon receiving a `SeenTx` message: + +- It should mark the peer as having seen the message. +- If the node has recently rejected that transaction, it SHOULD ignore the message. +- If the node already has the transaction, it SHOULD ignore the message. +- If the node does not have the transaction but recently evicted it, it MAY choose to rerequest the transaction if it has adequate resources now to process it. +- If the node has not seen the transaction or does not have any pending requests for that transaction, it can do one of two things: + - It MAY immediately request the tx from the peer with a `WantTx`. + - If the node is connected to the peer specified in `FROM`, it is likely, from a non-byzantine peer, that the node will also shortly receive the transaction from the peer. It MAY wait for a `Txs` message for a bounded amount of time but MUST eventually send a `WantMsg` message to either the original peer or any other peer that *has* the specified transaction. + +Upon receiving a `WantTx` message: + +- If it has the transaction, it MUST respond with a `Txs` message containing that transaction. +- If it does not have the transaction, it MAY respond with an identical `WantTx` or rely on the timeout of the peer that requested the transaction to eventually ask another peer. + +### Compatibility + +CAT has Go API compatibility with the existing two mempool implementations. It implements both the `Reactor` interface required by Tendermint's P2P layer and the `Mempool` interface used by `consensus` and `rpc`. CAT is currently network compatible with existing implementations (by using another channel), but the protocol is unaware that it is communicating with a different mempool and that `SeenTx` and `WantTx` messages aren't reaching those peers thus it is recommended that the entire network use CAT. +