-
Notifications
You must be signed in to change notification settings - Fork 10
Sweep transactions (1-in-1-out) are incorrectly scored in isolation — privacy is inherited, not generated #72
Description
Summary
A 1-input, 1-output transaction (sweep) is currently evaluated as an independent event. The tool reports zero entropy and a deterministic link (severity: low, impact: 0). This is technically accurate but analytically meaningless — it describes a property shared by every spend, not a privacy deficiency specific to sweeps.
This issue argues that:
- A sweep does not degrade the privacy of the spent UTXO
- A sweep introduces ownership ambiguity that did not exist before
- The correct evaluation requires inspecting the parent transaction
On-chain properties of a 1-in-1-out sweep
These are observable facts, not interpretations:
| Property | Applies? | Why |
|---|---|---|
| Common Input Ownership (CIOH) | No | Single input — no address linkage possible |
| Change detection | No | Single output — no change to identify |
| Consolidation | No | No UTXO set or balance revealed |
| Round amount heuristic | No | Only one output — nothing to compare against |
| Entropy | 0 bits | One possible interpretation of fund flow |
| Script type fingerprint | Observable | Input and output script types are visible |
Every heuristic that the tool uses to penalize transactions requires either multiple inputs, multiple outputs, or both. A sweep has neither. The only observable fact is: funds moved from address A to address B.
The sweep does not degrade privacy
The UTXO at address B carries exactly the same history, taint, and cluster associations as it had at address A. No new information about the owner's UTXO set, spending patterns, or balance is revealed by the sweep itself.
This is verifiable: take any 1-in-1-out transaction and compare the cluster graph before and after. The sweep adds one edge (A → B) but does not expand the cluster — because CIOH cannot fire on a single input.
The sweep introduces ownership ambiguity
Before the sweep, address A provably controls the UTXO. There is no ambiguity.
After the sweep, an observer knows A sent to B, but cannot determine whether:
- B is the same entity as A (self-transfer, wallet migration)
- B is a different entity (payment)
This is a form of plausible deniability that did not exist while the UTXO sat unspent. The ambiguity is irresolvable using only on-chain data, unless:
- B belongs to a known entity (exchange, merchant, tagged address)
- B already exists in a cluster linked to a different entity via prior CIOH or consolidation
- The subsequent spend from B consolidates with UTXOs from a known different cluster
In the absence of these signals, the observer cannot assign ownership of B with certainty.
The ambiguity is real even though most sweeps are statistically self-transfers
In practice, exact payments without change are infrequent — most 1-in-1-out sweeps correspond to self-transfers, wallet migrations, or service deposits. A chain analysis professional knows this and will use it as a statistical prior.
However, the possibility that it is a payment exists, and that is enough to prevent the question from being resolved with certainty using on-chain data alone.
An analyst can combine weak signals to reinforce a suspicion — wallet fingerprint, script type, subsequent behavior of B — but none of them individually or combined constitute deterministic proof. They guide the investigation, they do not conclude it.
Combinable signals that guide suspicion
These signals are implementable with data the tool already has or can obtain with one extra API call:
Suggests self-transfer:
- Same script type between input and output (bc1q → bc1q, bc1p → bc1p)
- B subsequently consolidates with UTXOs that were already in A's cluster
- Subsequent spend from B shows the same wallet fingerprint
Suggests change of ownership:
- B belongs to a known entity
- B already existed in a different entity's cluster
- B consolidates with UTXOs from a different cluster than A's
Ambiguous (does not resolve):
- Different wallet fingerprint in subsequent spend — could be a different owner or simply a software change
- Different script type (bc1q → bc1p) — could be Taproot migration or payment
The tool should present these signals for what they are: probabilistic indicators that guide, not conclusions.
Proposal: inherit privacy from the parent transaction
Since the sweep neither generates nor destroys privacy, the score should reflect the privacy context of the UTXO being spent — i.e., the parent transaction that created it.
| Parent transaction type | Inherited evaluation | Rationale |
|---|---|---|
| CoinJoin equal-output (Whirlpool, JoinMarket) | High score | Anon set reduces from N to 1 upon spending — note this, but the UTXO retains strong privacy |
| CoinJoin variable-output (WabiSabi) | High score | No anon set based on amount equality to degrade |
| Tx with detected change or round amount | Inherits parent penalties | These weaknesses pre-existed; the sweep did not cause them |
| Known entity origin (KYC exchange) | Low score (inherited) | The taint was present before the sweep |
| Escrow release (HodlHodl 2-of-3, Bisq 2-of-2) | Inherits escrow detection | The tool already identifies these via multisig + fee address patterns |
For sweep chains (A→B→C→D), resolve recursively up to a defined limit until a non-sweep transaction is found.
Known limitations
Transaction-level vs UTXO-level scoring
This is the principal implementation challenge. The tool currently scores whole transactions. If the parent transaction has two outputs — one identified as payment, one as change — their privacy properties differ:
- Sweep spends the change output → inherits sender-side linkage
- Sweep spends the payment output → inherits receiver-side privacy
Without distinguishing which output the sweep spends, the inherited score may be inaccurate. The existing change detection heuristic could serve as a first approximation, with the caveat that change detection itself is probabilistic, not deterministic.
API cost and recursion depth
Each parent transaction lookup requires one API call to mempool.space. Sweep chains require recursive resolution. A reasonable limit (e.g., 5 hops) bounds the cost while covering the vast majority of real-world cases.
Ownership ambiguity is not quantifiable
The plausible deniability benefit is real but difficult to express as a numeric score modifier. It may be better represented as a qualitative finding ("ownership ambiguity: irresolvable from on-chain data alone") rather than a numeric score bonus.
Future improvement: child transaction analysis
Analysis of the subsequent spend from B (wallet fingerprint, consolidation with other clusters) is not proposed as mandatory in this iteration. It requires an extra API call and cross-transaction fingerprint comparison. It is mentioned as a future improvement that would enrich the informational context without modifying the score.
Verification
A chain analysis professional can verify the core claims:
- The sweep does not expand clusters: Take any 1-in-1-out tx. Run CIOH. Confirm no new address linkage is produced.
- Privacy is inherited: Compare the cluster graph of the input UTXO before and after the sweep. Confirm no new information is added.
- Ownership ambiguity exists: Take a 1-in-1-out tx where neither address belongs to a known entity. Attempt to determine ownership of the output using only on-chain data. Confirm it is indeterminate.
References
- Current implementation:
src/lib/analysis/heuristics/change-detection.ts(L40-68),src/lib/analysis/heuristics/entropy.ts(L42-62) - Existing backward tracing:
src/lib/analysis/chain/ - Meiklejohn et al., "A Fistful of Bitcoins" (2013) — CIOH requires multiple inputs
- Nick, "Data-Driven De-Anonymization in Bitcoin" (2015) — change detection requires multiple outputs