From 55be85d956b81d3d19a4d3a8a7bd7f1e1b3b7493 Mon Sep 17 00:00:00 2001 From: Rod Vagg Date: Thu, 8 Jan 2026 11:53:56 +1100 Subject: [PATCH 1/4] docs(pdp): add per-piece security guarantees section --- docs/design.md | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/docs/design.md b/docs/design.md index 4786f1b..1c2b6b3 100644 --- a/docs/design.md +++ b/docs/design.md @@ -153,6 +153,52 @@ Detailed description of key workflows. - Merkle proofs must be sound - Randomness cannot be biased through grinding or chain forking +### Per-Piece Security Guarantees + +A common concern is: "My specific piece wasn't challenged in the last X days—how do I know it's still safe?" + +The key insight is that **successful data set proofs provide strong statistical guarantees for all pieces in the data set**, regardless of which specific pieces were challenged. Random challenge selection means any missing data will be detected with high probability over time. + +**How detection works:** + +The system issues K random challenges per proving period across the entire data set. If a storage provider has lost any portion of the data, each challenge has a chance of hitting the missing data and causing proof failure. + +Let: +- α = fraction of data missing (e.g., 0.05 = 5%) +- K = number of challenges per proving period + +The probability that a dishonest prover evades detection in a single proving period is: + +``` +p = (1-α)^K +``` + +**Detection probability over time:** + +With one proof per day containing K challenges, the evasion probability after T days is: + +``` +p_T = (1-α)^(K×T) +``` + +**Example detection rates (K=5 challenges per day):** + +| Data Lost (α) | Daily Detection | 30-Day Detection | +|---------------|-----------------|------------------| +| 1% | 4.9% | 77.9% | +| 5% | 22.6% | 99.95% | +| 20% | 67.2% | ~100% | + +**What this means for individual pieces:** + +If a storage provider has lost any significant fraction of a data set, they will be caught with high probability regardless of which specific pieces are missing. The random challenge selection ensures that: + +1. A provider cannot selectively discard "unchallengeable" pieces—all pieces have equal probability of being challenged +2. Even if your specific piece hasn't been challenged recently, the successful proofs on other parts of the data set provide statistical confidence that the entire data set (including your piece) remains intact +3. The longer a data set is proven without faults, the higher the confidence that all pieces are present + +This is fundamentally different from per-piece proving (where each piece would need individual challenges) and is more efficient while providing equivalent security guarantees for detecting any meaningful data loss. + ### Completeness - Proving always works if providing Merkle proofs to the randomly sampled leaves From c7624c8fc57f479e217bf694a37f2e61ae12ae28 Mon Sep 17 00:00:00 2001 From: Rod Vagg Date: Fri, 9 Jan 2026 10:39:09 +1100 Subject: [PATCH 2/4] fixup! docs(pdp): add per-piece security guarantees section --- docs/design.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/design.md b/docs/design.md index 1c2b6b3..2615bbd 100644 --- a/docs/design.md +++ b/docs/design.md @@ -157,7 +157,7 @@ Detailed description of key workflows. A common concern is: "My specific piece wasn't challenged in the last X days—how do I know it's still safe?" -The key insight is that **successful data set proofs provide strong statistical guarantees for all pieces in the data set**, regardless of which specific pieces were challenged. Random challenge selection means any missing data will be detected with high probability over time. +The key insight is that **successful data set proofs provide strong probabilistic guarantees for all pieces in the data set**, regardless of which specific pieces were challenged. Random challenge selection means that there is no way to know in advance which piece is going to be challenged, thus it is very likely that a data loss will be eventually found over time. **How detection works:** @@ -194,10 +194,10 @@ p_T = (1-α)^(K×T) If a storage provider has lost any significant fraction of a data set, they will be caught with high probability regardless of which specific pieces are missing. The random challenge selection ensures that: 1. A provider cannot selectively discard "unchallengeable" pieces—all pieces have equal probability of being challenged -2. Even if your specific piece hasn't been challenged recently, the successful proofs on other parts of the data set provide statistical confidence that the entire data set (including your piece) remains intact +2. Even if your specific piece hasn't been challenged recently, the successful proofs on other parts of the data set provide a probabilistic guarantee that the entire data set (including your piece) remains intact 3. The longer a data set is proven without faults, the higher the confidence that all pieces are present -This is fundamentally different from per-piece proving (where each piece would need individual challenges) and is more efficient while providing equivalent security guarantees for detecting any meaningful data loss. +This is fundamentally different from per-piece proving (where each piece would need individual challenges) and is more efficient while providing strong security guarantees for detecting any meaningful data loss. ### Completeness - Proving always works if providing Merkle proofs to the randomly sampled leaves From 7d004b3ed376837c65cc1f0f6439021cfd6f67a6 Mon Sep 17 00:00:00 2001 From: Rod Vagg Date: Mon, 12 Jan 2026 15:14:14 +1100 Subject: [PATCH 3/4] fixup! docs(pdp): add per-piece security guarantees section --- docs/design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design.md b/docs/design.md index 2615bbd..a714e9c 100644 --- a/docs/design.md +++ b/docs/design.md @@ -191,7 +191,7 @@ p_T = (1-α)^(K×T) **What this means for individual pieces:** -If a storage provider has lost any significant fraction of a data set, they will be caught with high probability regardless of which specific pieces are missing. The random challenge selection ensures that: +As shown in the table above, detection confidence depends on the fraction of data lost and the proving period. For a 1% data loss, detection reaches 77.9% confidence within 30 days and exceeds 99% within 90 days. Larger losses are caught faster—5% loss reaches 99.95% detection in just 30 days. The random challenge selection ensures that: 1. A provider cannot selectively discard "unchallengeable" pieces—all pieces have equal probability of being challenged 2. Even if your specific piece hasn't been challenged recently, the successful proofs on other parts of the data set provide a probabilistic guarantee that the entire data set (including your piece) remains intact From 695f64c8e2105b3fdf3e33b8f53613ef94c83c93 Mon Sep 17 00:00:00 2001 From: Rod Vagg Date: Wed, 21 Jan 2026 11:26:26 +1100 Subject: [PATCH 4/4] fixup! docs(pdp): add per-piece security guarantees section --- docs/design.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/design.md b/docs/design.md index a714e9c..e4ddd12 100644 --- a/docs/design.md +++ b/docs/design.md @@ -199,6 +199,16 @@ As shown in the table above, detection confidence depends on the fraction of dat This is fundamentally different from per-piece proving (where each piece would need individual challenges) and is more efficient while providing strong security guarantees for detecting any meaningful data loss. +**Using detection history for trust decisions:** + +PDP provides detection confidence, not failure prevention. It answers "if data is lost, how likely are we to catch it?" rather than "will data be lost?" However, a provider's historical proof record serves as a practical indicator of operational reliability. A provider that has successfully proven a data set for 30+ days demonstrates: + +1. Functional storage infrastructure +2. Operational consistency +3. No detected data loss during that period + +A clean proof record is strong evidence of operational reliability, though not a guarantee of future performance. + ### Completeness - Proving always works if providing Merkle proofs to the randomly sampled leaves