A framework for automated market maker microstructure analysis and behavioral on-chain forensics
Decentralized exchange (DEX) microstructure remains under-studied in the systems literature relative to its traditional finance counterpart. Existing open-source tooling for on-chain market analysis lacks production-grade precision in swap simulation, streaming feature extraction, and behavioral forensics. Most implementations rely on floating-point arithmetic that introduces rounding errors at the lamport level, batch-mode processing that precludes real-time use, or approximate methods that sacrifice accuracy for convenience.
This repository contributes six Rust crates that address these gaps. The swap simulation achieves sub-basis-point precision validated against 910,000+ on-chain trades. The microstructure pipeline extracts 40 features in O(1) amortized time per trade using online statistics with bounded memory. The behavioral analysis modules detect Sybil attacks, copy-trading relationships, and wallet fingerprints from streaming trade data without requiring batch reprocessing. The execution layer provides AMM-curve-aware position management with formal exit condition priority ordering. The MEV detection library identifies sandwich attacks, arbitrage, and failed arb patterns from streaming trade data. The replay harness enables offline backtesting of the full pipeline against historical CSV trade data with timestamp-aware windowed aggregation.
- Deterministic swap simulation with integer arithmetic (u64/u128), validated against 910K+ on-chain trades at median error 0.000000% and P99 error 0.000005% — covering both constant-product AMMs and bonding curve models
- Streaming 40-feature microstructure pipeline with O(1) amortized updates per trade via online EMA (Welford 1962), CUSUM change-point detection (Page 1954), and streaming Gini concentration indices
- Multi-signal Sybil detection combining five orthogonal indicators (temporal bundling, size uniformity, capital fragmentation, slot coordination, ghost selling) into a weighted composite score
- Directed-graph copy-trading discovery from trade co-occurrence with four-way relationship classification and transitive chain resolution to root operators
- Behavioral wallet fingerprinting via streaming feature vector aggregation with cosine similarity for cross-token operator identification
- AMM-curve-aware exit engine with priority-ordered condition checking, trailing stop activation state machine, and PnL caps that guard against phantom profits from stale reserve data
- Streaming MEV detection for sandwich attacks (slot-windowed frontrunner/victim/backrun pattern matching), cross-DEX arbitrage discovery, and failed arb classification via bincode TransactionError decoding with per-DEX slippage code mapping
- Historical replay harness with timestamp-aware windowed aggregation that bypasses the
Utc::now()limitation in streaming modules, enabling offline backtesting of the full pipeline (market state → features → scores → sybil → copy detection → position simulation) against CSV trade data
Every module in this framework adheres to three constraints derived from the requirements of real-time trading systems:
Streaming-first. All algorithms operate on unbounded event streams with bounded memory. State is maintained via online statistics (exponential moving averages, cumulative sums) rather than windowed aggregates over raw data. No module requires batch reprocessing or historical lookback beyond its configured buffer depth.
Deterministic arithmetic. The swap simulation core path uses exclusively integer arithmetic (u64/u128). Floating-point is confined to human-readable outputs and heuristic scores where exact reproducibility is not required. This eliminates the class of bugs where rounding divergence between simulation and on-chain execution causes position mispricing.
Bounded resource consumption. Hash maps are capped at configurable maxima. Ring buffers enforce fixed memory per tracked entity. Time-windowed structures self-evict stale entries. The system can track thousands of concurrent tokens and wallets without unbounded growth.
Trade Stream (gRPC / WebSocket)
│
┌───────────────┬───────┼───────────────┐
▼ ▼ ▼ ▼
┌─────────────┐ ┌────────────┐ ┌────────────┐ ┌───────────────┐
│ dex-math │ │ behavioral-│ │ micro- │ │ mev-detection │
│ │ │ analysis │ │ structure │ │ │
│ Swap sim │ │ │ │ │ │ Sandwich │
│ CPMM + BC │ │ Sybil │ │ MarketState│ │ Arbitrage │
│ Spot price │ │ Copy-trade │ │ 40 features│ │ Failed arb │
│ Slippage │ │ Fingerprint│ │ Scoring │ │ Competition │
│ Exit value │ │ │ │ Quoting │ │ │
└──────┬──────┘ └─────┬──────┘ └─────┬──────┘ └───────────────┘
│ │ │
│ ┌───────┴───────┐ │
└─────▶│ execution │◀──────┘
│ │
│ Exit engine │
│ Position mgmt │
│ Strategy FSM │
│ AMM-aware PnL │
└───────┬───────┘
│
┌───────┴───────┐
│ replay │
│ │
│ CSV ingestion │
│ Volume tracker│
│ Full pipeline │
│ Backtesting │
└───────────────┘
Dependency graph: dex-math is standalone (zero runtime dependencies beyond serde). microstructure and execution depend on dex-math for curve math and quoting. behavioral-analysis is fully independent. mev-detection is fully independent (zero workspace deps). execution consumes outputs from both microstructure and behavioral-analysis to inform position management. replay depends on all four core crates to provide offline backtesting of the full pipeline.
dex-math — Deterministic AMM Swap Simulation
Pure-math swap simulation for constant-product AMMs and bonding curves. Integer arithmetic throughout the core path (u64/u128) with f64 only for human-readable outputs. Validated against 910K+ on-chain PumpFun trades with 99.9%+ exact match rate.
- Constant-product AMM (Raydium V4/CPMM, Uniswap V2-style)
- Bonding curve model (PumpFun-style virtual + real reserves)
- Exact-in and exact-out swap modes
- Slippage bounds via binary search
- 34 tests including property-based roundtrip validation
microstructure — Streaming Market Microstructure
Real-time market state tracking with O(1) amortized updates via online statistics:
- MarketState: Full bonding curve tracker with reserves, prices, volumes, holder demographics, and whale concentration metrics
- 40-feature extraction pipeline: Price, volume, fill progress, holder demographics, concentration indices (Gini, Lorenz top-k), inter-arrival times, CUSUM change-point signals
- Heuristic scoring: Pump score (0-1), rug score (0-1), momentum score (0-1) from orthogonal microstructure signals
- Fee-inclusive quoting: Buy/sell quotes with price impact, fee breakdown, and slippage bounds
- 15 tests covering state tracking, feature extraction, scoring, and quote roundtrips
behavioral-analysis — On-Chain Behavioral Forensics
Streaming detection of coordinated, manipulative, and automated on-chain behavior:
- Sybil detection: 5-signal composite (same-slot buys, size uniformity, wallets-per-SOL, slot bundling, ghost sellers) with configurable thresholds and per-wallet exposure scoring
- Copy-trading graph: Directed edge construction from trade co-occurrence with 4-way relationship classification (direct copy, dust signal, Sybil cluster, signal relay) and transitive chain resolution to root operators
- Wallet fingerprinting: 40-dimensional behavioral profiles from streaming feature vector aggregation with cosine similarity for cross-token operator identification
- 12 tests covering detection accuracy, graph construction, and similarity search
execution — Position Management & Strategy Framework
Core abstractions for managing trading positions against AMMs:
- Exit engine: Priority-ordered state machine for trailing stops, take profit, stop loss, stale timeout, and max hold with configurable activation thresholds
- AMM-aware PnL: Curve-based exit value calculation via
dex-mathwith naive fallback, cross-pool contamination rejection (>10x divergence filter), and configurable gain/loss caps - Position lifecycle: Open/update/close with embedded price tracking and trailing stop state
- Strategy signals: Typed signal framework with metadata, confidence scoring, and JSON configuration merge
- 14 tests covering exit conditions, PnL caps, trailing stop activation, and position state
mev-detection — Streaming MEV Detection
Streaming detection of MEV patterns from on-chain trade events. Zero workspace dependencies — fully standalone:
- Sandwich detection: Slot-windowed pattern matching for frontrunner/victim/backrun triplets with profit computation
- Arbitrage detection: Cross-DEX buy/sell detection within slot windows for same-trader circular routes
- Failed arb classification: Bincode TransactionError decoding with per-DEX slippage code mapping (Jupiter 6001, Raydium 12/30, Orca 6034/0xFADED, Meteora 6024/6038)
- Competition metrics: Per-pool niche scoring based on unique trader counts for finding low-competition opportunities
- Unified facade:
MevDetectorcombines all detectors with singleprocess_tradeandprocess_failedentry points - 17 tests covering sandwich triplets, cross-DEX arb, error decoding, slippage codes, and slot eviction
replay — Historical Trade Replay & Backtesting
Offline replay harness that pipes CSV trade data through the full pipeline:
- CSV ingestion: Deserializes
TradeRecordrows with bonding curve reserves, timestamps, and trader identifiers - Volume tracker: Timestamp-aware windowed aggregation that bypasses
Utc::now()inMarketState, enabling accurate 1m/5m/15m windows during historical replay - Full pipeline: Market state → feature extraction → scoring → Sybil detection → copy detection → position simulation
- Position simulation: Manual exit checks (TP/SL/trailing stop/max hold) using trade timestamps instead of wall-clock time
- CLI binary:
replay <CSV_FILE> [--no-features] [--no-scoring] [--positions] [--output FILE] - 8 tests covering replay correctness, feature dimensions, scoring bounds, Sybil detection, and position tracking
$ cargo test --workspace
100 tests across 6 crates — 0 failures
| Crate | Tests | Coverage Focus |
|---|---|---|
dex-math |
34 | Swap precision, roundtrip properties, edge cases, 910K trade validation |
microstructure |
15 | State tracking, feature dimensions, score bounds, quote symmetry |
behavioral-analysis |
12 | Sybil signal accuracy, graph construction, fingerprint similarity |
execution |
14 | Exit priority, PnL caps, trailing stop FSM, position lifecycle |
mev-detection |
17 | Sandwich triplets, cross-DEX arb, error decoding, slot eviction |
replay |
8 | Pipeline integration, volume tracker, position simulation, CSV fixtures |
The dex-math crate was validated against 910,000+ on-chain PumpFun trade events:
| Metric | Value |
|---|---|
| Total trades validated | 910,000+ |
| Median error | 0.000000% |
| P99 error | 0.000005% |
| Exact match rate | 99.9%+ |
Validation was performed by replaying historical trade events, computing expected outputs from on-chain reserve states, and comparing against the crate's deterministic simulation. The sub-basis-point error at P99 is attributable to integer rounding in the final lamport — a consequence of the discrete nature of on-chain token accounting where fractional lamports do not exist.
Property-based testing (via proptest) supplements the empirical validation with roundtrip invariants: for any valid pool state and swap input, buy(sell(x)) <= x (fees are monotonically consumed) and reserve conservation holds across all tested configurations.
use dex_math::{PoolState, SwapDirection, SwapInput, simulate_swap, spot_price};
// Create a bonding curve pool state
let pool = PoolState::PumpFun {
virtual_sol: 30_000_000_000, // 30 SOL
virtual_token: 1_073_000_000_000_000, // ~1.073B tokens
real_sol: 5_000_000_000, // 5 SOL deposited
real_token: 700_000_000_000_000,
token_decimals: 6,
};
// Get spot price
let price = spot_price(&pool).unwrap();
println!("Spot price: {:.10} SOL/token", price);
// Simulate a 1 SOL buy
let result = simulate_swap(
&pool,
SwapDirection::BuyBase,
SwapInput::ExactIn(1_000_000_000),
).unwrap();
println!("Tokens out: {}", result.amount_out);
println!("Price impact: {:.4}%", result.price_impact * 100.0);See examples/ for complete working examples including streaming feature extraction and Sybil detection.
cargo test --workspaceThis work builds on and extends several lines of research:
- AMM mechanism design: Moallemi & Roughgarden (2024), "Automated Market Making and Arbitrage Profits in the Presence of Fees"
- DEX microstructure: Capponi & Jia (2021), "The Adoption of Blockchain-based Decentralized Exchanges"
- MEV and transaction ordering: Park (2023), "The Conceptual Flaws of Decentralized Automated Market Making"; Daian et al. (2020), "Flash Boys 2.0: Frontrunning in Decentralized Exchanges"
- Online statistics: Welford (1962), "Note on a method for calculating corrected sums of squares and products" — basis for the streaming EMA with variance
- Change-point detection: Page (1954), "Continuous inspection schemes" — basis for the CUSUM detector
- Concentration measures: Lorenz (1905) and Gini (1912) — basis for whale concentration and volume inequality metrics
- Sybil resistance: Douceur (2002), "The Sybil Attack" — foundational threat model for the multi-signal detection approach
MIT