-
Notifications
You must be signed in to change notification settings - Fork 14
Description
In the SDK we are currently relying on a call to the PDP server to get the full data set info which gives us the piece IDs, then we iterate over them to get the pieceId. https://github.com/FilOzone/synapse-sdk/blob/4be67a838c5b602608b3ac33f9702c1343f87e09/packages/synapse-sdk/src/storage/context.ts#L1348-L1395
This is both inefficient and it trusts the server to provide us with accurate information. The other way to do it is eth_call getActivePieces on PDPVerifier, iterate over them and find the piece. This is also inefficient, and suffers from problems arising from very large data sets as per #243; the fix in #246 is yet to be deployed but that would make this more practical, but still very inefficient.
We need pieceId for a number of operations, particularly the delete operation where we call the PDP server and tell it the ID to delete. We don't give the client the ID, we give them the CID and we do translation internally. So we need a mapping of CID to ID for an eth_call:
We need: function findPieceIdsByCid(uint256 setId, Cids.Cid calldata pieceCid) public view returns (uint256[] memory pieceIds) - returns an array because we don't have a unique CID constraint.
But I'm also conscious of the very-large problem we encountered in #243, we could have thousands of the same pieceCid in a data set and this becomes a problem. So here's my proposal for a means to be able to get all of them, or just the first one if that's all you care about. Ignoring the deleted ones:
/// @notice Finds piece IDs matching a given piece CID, with cursor-based pagination
/// @param setId The data set ID
/// @param pieceCid The piece CID to search for
/// @param startPieceId Piece ID to start scanning from (0 for first call)
/// @param limit Maximum number of matching piece IDs to return
/// @return pieceIds Array of matching piece IDs (up to limit)
function findPieceIdsByCid(uint256 setId, Cids.Cid calldata pieceCid, uint256 startPieceId, uint256 limit)
public
view
returns (uint256[] memory pieceIds)
{
require(dataSetLive(setId), "Data set not live");
require(limit > 0, "Limit must be greater than 0");
bytes32 targetHash = keccak256(pieceCid.data);
uint256 maxPieceId = nextPieceId[setId];
pieceIds = new uint256[](limit);
uint256 count = 0;
for (uint256 i = startPieceId; i < maxPieceId && count < limit; i++) {
if (pieceLeafCounts[setId][i] == 0) continue;
if (keccak256(pieceCids[setId][i].data) == targetHash) {
pieceIds[count++] = i;
}
}
assembly {
mstore(pieceIds, count)
}
}If you wanted all matching that CID then you'd repeatedly call it until the return length is less than the limit you asked for.
Implementation needs tests too of course.