-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Currently: addPieces(uint256 setId, Cids.Cid[] calldata pieceData, bytes calldata extraData)
Curio has support for building a single piece from multiple "subpieces", similar to the Filecoin miner actor sector onboarding that takes multiple pieces and combines them in to one root CommD. With a manifest subpiece PieceCIDs (including size, so CommPv2), you can verify that they resolve to a parent PieceCID, but you can't go the other way obviously.
Now that we use CommPv2 everywhere, we have a problem with this subgraph mechanism because the aggregate piece size is now going to be larger (>=) than the sum of the raw sizes of the subpieces, because we have to arrange them in such a way that they align properly into the tree needed to form the aggregate piece. This has downstream effects because of CommP being a source of piece size:
- PDPVerifier thinks its verifying the full aggregate piece size, not the combined raw size of the subpieces
- PDP Explorer will be seeing this size too and reporting it as the raw size of the piece
- SDK & user will have a CommP that will report that it references a piece that is larger (potentially significantly larger) than the actual combined raw sizes of the pieces
Curio should be able to deal with this, I think. I haven't verified but Magik built this and I trust his ability to get this right.
I think this should be OK as long as we're clear with users and limit the uses of it to where it's necessary. In order to have nice overlap with PoRep, I think we should treat 32G as the boundary (minus fr32), and have Curio refuse to accept piece upload of larger than this, as well as the SDK. If you want to onboard a PDP piece larger than this, then you have to use the subpiece mechanism up to whatever size you need. But, we should record it somewhere so this isn't just ephemeral data between the SDK and Curio.
So my proposal is:
- Extend
addPiecesto take a "PieceManifest", similarly mapping to the concept in the miner actor except in this case we will use(SubPieceCID, Offset)pairs (by using offsets, we give ourselves additional flexibility to arrange content and potentially insert meta content (e.g. PoDSI data) -- these pieces combined will produce my aggregate PieceCID. I'm not sure what the call signature for this should be, maybe it needs a separate one for when you want to do this. - For now, due to the gas cost of verifying this in the contract, it will just trust that this information is correct. All we know here is that Curio reported this, so we're essentially trusting Curio. We also have the option of passing this manifest through
extraDatainto the service contract and verifying the client signature there, in which case we can then have assurance that (a) Curio reported this and thinks it's correct, but more importantly, (b) the user is asserting this is what they want recorded in the deal. We make these trust assumptions very clear in the contract. - We emit this information as an event, don't store it, but we now have it on record and an indexer could pick it up and use it as needed.
- The SDK has a mechanism for uploading very large data that breaks the 32G boundary, but in return you don't just get a single CommP (with the wrong raw size in it), you get that and a PieceManifest to go along with it that lists your subpieces that made up the entire piece.
- The PDP Explorer is going to report the wrong size, but it can pick up the piece manifest and report that along with the piece.
- Linking with PoRep (
archive()) will happen either at the sub-32G level (putting together multiple pieces into a larger manifest for a single sector) or at the above-32G level by committing subpieces separately and using the piece manifest to do that and be able to verify that your aggregate piece is available.
This assumes that Curio can (a) prove these large aggregate pieces (probably can now), and (b) can allow /piece/ retrieval of either aggregate or subpieces (apparently mkv2 is supposed to support this).