Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,62 @@ Detailed description of key workflows.
- Merkle proofs must be sound
- Randomness cannot be biased through grinding or chain forking

### Per-Piece Security Guarantees

A common concern is: "My specific piece wasn't challenged in the last X days—how do I know it's still safe?"

The key insight is that **successful data set proofs provide strong probabilistic guarantees for all pieces in the data set**, regardless of which specific pieces were challenged. Random challenge selection means that there is no way to know in advance which piece is going to be challenged, thus it is very likely that a data loss will be eventually found over time.

**How detection works:**

The system issues K random challenges per proving period across the entire data set. If a storage provider has lost any portion of the data, each challenge has a chance of hitting the missing data and causing proof failure.

Let:
- α = fraction of data missing (e.g., 0.05 = 5%)
- K = number of challenges per proving period

The probability that a dishonest prover evades detection in a single proving period is:

```
p = (1-α)^K
```

**Detection probability over time:**

With one proof per day containing K challenges, the evasion probability after T days is:

```
p_T = (1-α)^(K×T)
```

**Example detection rates (K=5 challenges per day):**

| Data Lost (α) | Daily Detection | 30-Day Detection |
|---------------|-----------------|------------------|
| 1% | 4.9% | 77.9% |
| 5% | 22.6% | 99.95% |
| 20% | 67.2% | ~100% |

**What this means for individual pieces:**

As shown in the table above, detection confidence depends on the fraction of data lost and the proving period. For a 1% data loss, detection reaches 77.9% confidence within 30 days and exceeds 99% within 90 days. Larger losses are caught faster—5% loss reaches 99.95% detection in just 30 days. The random challenge selection ensures that:

1. A provider cannot selectively discard "unchallengeable" pieces—all pieces have equal probability of being challenged
2. Even if your specific piece hasn't been challenged recently, the successful proofs on other parts of the data set provide a probabilistic guarantee that the entire data set (including your piece) remains intact
3. The longer a data set is proven without faults, the higher the confidence that all pieces are present

This is fundamentally different from per-piece proving (where each piece would need individual challenges) and is more efficient while providing strong security guarantees for detecting any meaningful data loss.

**Using detection history for trust decisions:**

PDP provides detection confidence, not failure prevention. It answers "if data is lost, how likely are we to catch it?" rather than "will data be lost?" However, a provider's historical proof record serves as a practical indicator of operational reliability. A provider that has successfully proven a data set for 30+ days demonstrates:

1. Functional storage infrastructure
2. Operational consistency
3. No detected data loss during that period

A clean proof record is strong evidence of operational reliability, though not a guarantee of future performance.

### Completeness
- Proving always works if providing Merkle proofs to the randomly sampled leaves

Expand Down