feat: add DatumCache for memory optimization of inline datums#18
feat: add DatumCache for memory optimization of inline datums#18
Conversation
Introduce DatumCache module to reduce memory footprint of Hydra nodes by storing datum hashes instead of full inline datums in memory. - Add DatumCache type (strict Map from datum hash to full datum) - Add HasDatumCache type class for abstracting datum operations - Implement stripDatums/restoreDatums for UTxO manipulation - Export emptyCache, insertDatum, lookupDatum utilities
- Add datumCache field to OpenState for storing stripped datums - Add HasDatumCache constraint to StateChanged and Outcome types - Implement no-op HasDatumCache instance for SimpleTx test type
Strip inline datums from UTxOs to reduce memory footprint: - HeadOpened: strip datums from initialUTxO - TransactionAppliedToLocalUTxO: strip after tx validation - CommitFinalized: strip datums from deposited UTxO - DecommitRecorded: strip datums from remaining UTxO Restore datums before on-chain transactions to preserve hash consistency: - HeadClosed: restore datums before storing in ClosedState - onOpenClientClose: restore datums before emitting CloseTx - onOpenChainCloseTx: restore datums before emitting ContestTx This fixes a critical bug where stripped datums would produce different hashes than original inline datums, causing on-chain validation failures.
Add HasDatumCache constraint to: - HydraNode type in Node.hs - runHydraNode and related functions in Node/Run.hs - Server functions in API/Server.hs
- Add datumCache = emptyCache to OpenState in HeadLogicSpec - Add HasDatumCache constraint in NodeSpec and RotationSpec
- Add DatumCache schema to api.yaml - Update golden test files to include datumCache field in OpenState
Transaction cost differencesNo cost or size differences found |
Transaction costsSizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using
Script summary
|
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 5836 | 10.64 | 3.38 | 0.52 |
| 2 | 6038 | 12.34 | 3.90 | 0.54 |
| 3 | 6239 | 14.72 | 4.66 | 0.58 |
| 5 | 6640 | 18.41 | 5.80 | 0.63 |
| 10 | 7646 | 29.00 | 9.14 | 0.79 |
| 43 | 14281 | 98.97 | 30.93 | 1.80 |
Commit transaction costs
This uses ada-only outputs for better comparability.
| UTxO | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 561 | 2.44 | 1.16 | 0.20 |
| 2 | 743 | 3.38 | 1.73 | 0.22 |
| 3 | 920 | 4.36 | 2.33 | 0.24 |
| 5 | 1280 | 6.41 | 3.60 | 0.28 |
| 10 | 2176 | 12.13 | 7.25 | 0.40 |
| 54 | 10068 | 98.61 | 68.52 | 1.88 |
CollectCom transaction costs
| Parties | UTxO (bytes) | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|---|
| 1 | 57 | 525 | 24.46 | 7.13 | 0.42 |
| 2 | 114 | 636 | 33.18 | 9.60 | 0.52 |
| 3 | 171 | 747 | 43.73 | 12.51 | 0.63 |
| 4 | 224 | 858 | 50.85 | 14.62 | 0.70 |
| 5 | 282 | 969 | 64.32 | 18.24 | 0.84 |
| 6 | 338 | 1081 | 65.18 | 18.95 | 0.86 |
| 7 | 395 | 1192 | 74.44 | 21.41 | 0.96 |
| 8 | 449 | 1303 | 87.15 | 24.94 | 1.09 |
Cost of Increment Transaction
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 1785 | 24.29 | 7.69 | 0.48 |
| 2 | 1954 | 25.85 | 8.78 | 0.51 |
| 3 | 2068 | 27.40 | 9.88 | 0.53 |
| 5 | 2391 | 31.30 | 12.31 | 0.60 |
| 10 | 3175 | 41.04 | 18.38 | 0.75 |
| 40 | 7667 | 96.38 | 53.75 | 1.65 |
Cost of Decrement Transaction
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 645 | 22.50 | 7.30 | 0.41 |
| 2 | 768 | 24.35 | 8.48 | 0.44 |
| 3 | 829 | 24.09 | 9.03 | 0.45 |
| 5 | 1268 | 30.15 | 12.07 | 0.54 |
| 10 | 2230 | 42.44 | 18.85 | 0.73 |
| 38 | 6082 | 93.84 | 51.77 | 1.55 |
Close transaction costs
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 673 | 27.47 | 8.46 | 0.46 |
| 2 | 863 | 29.90 | 9.82 | 0.50 |
| 3 | 944 | 30.94 | 10.75 | 0.52 |
| 5 | 1249 | 35.04 | 13.25 | 0.58 |
| 10 | 2067 | 45.13 | 19.42 | 0.75 |
| 37 | 5891 | 95.71 | 51.56 | 1.55 |
Contest transaction costs
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 666 | 33.83 | 10.16 | 0.53 |
| 2 | 825 | 35.85 | 11.38 | 0.56 |
| 3 | 979 | 38.59 | 12.82 | 0.60 |
| 5 | 1273 | 42.64 | 15.28 | 0.66 |
| 10 | 2127 | 55.97 | 22.38 | 0.86 |
| 28 | 4673 | 95.60 | 45.33 | 1.46 |
Abort transaction costs
There is some variation due to the random mixture of initial and already committed outputs.
| Parties | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|
| 1 | 5795 | 27.13 | 9.11 | 0.69 |
| 2 | 5918 | 35.80 | 12.04 | 0.79 |
| 3 | 6107 | 44.89 | 15.06 | 0.89 |
| 4 | 6191 | 51.09 | 17.19 | 0.96 |
| 5 | 6604 | 66.70 | 22.65 | 1.14 |
| 6 | 6569 | 72.80 | 24.57 | 1.20 |
| 7 | 6689 | 79.54 | 26.73 | 1.28 |
| 8 | 6771 | 88.22 | 29.60 | 1.37 |
FanOut transaction costs
Involves spending head output and burning head tokens. Uses ada-only UTXO for better comparability.
| Parties | UTxO | UTxO (bytes) | Tx size | % max Mem | % max CPU | Min fee ₳ |
|---|---|---|---|---|---|---|
| 10 | 0 | 0 | 5834 | 18.30 | 6.11 | 0.60 |
| 10 | 1 | 57 | 5869 | 21.41 | 7.28 | 0.63 |
| 10 | 10 | 569 | 6173 | 38.18 | 14.00 | 0.83 |
| 10 | 20 | 1138 | 6512 | 58.66 | 22.07 | 1.07 |
| 10 | 30 | 1704 | 6851 | 80.92 | 30.76 | 1.33 |
| 10 | 40 | 2277 | 7193 | 99.84 | 38.30 | 1.55 |
| 10 | 39 | 2221 | 7161 | 99.12 | 37.95 | 1.54 |
End-to-end benchmark results
This page is intended to collect the latest end-to-end benchmark results produced by Hydra's continuous integration (CI) system from the latest master code.
Please note that these results are approximate as they are currently produced from limited cloud VMs and not controlled hardware. Rather than focusing on the absolute results, the emphasis should be on relative results, such as how the timings for a scenario evolve as the code changes.
Generated at 2026-01-05 15:25:36.626686903 UTC
Baseline Scenario
| Number of nodes | 1 |
|---|---|
| Number of txs | 300 |
| Avg. Confirmation Time (ms) | 5.510206623 |
| P99 | 9.466960909999965ms |
| P95 | 6.93772895ms |
| P50 | 5.2393695000000005ms |
| Number of Invalid txs | 0 |
Three local nodes
| Number of nodes | 3 |
|---|---|
| Number of txs | 900 |
| Avg. Confirmation Time (ms) | 32.543640974 |
| P99 | 50.21502475999999ms |
| P95 | 42.639221299999996ms |
| P50 | 31.453317ms |
| Number of Invalid txs | 0 |
- Remove unused Monoid instance for DatumCache - Remove unused functions: insertDatum, deleteDatum, cacheSize - Remove unused extractDatumsFromUTxO and extractInlineDatum helpers - Clean up redundant imports in Simple.hs and Node.hs
GitHub Actions runners have limited disk space (~14GB available). When building uncached Nix derivations (like our modified hydra-node), the build can exhaust disk space during compilation. This adds a cleanup step that removes unused tools before the build: - .NET SDK (~1.8GB) - Android SDK (~9GB) - GHC (~5GB) - CodeQL (~2.5GB) - Unused Docker images This frees up ~20GB of disk space, ensuring builds complete successfully.
- Add pull_request trigger for PRs targeting master branch - Tag PR builds as pr-<number> for easy identification - Use PR head SHA as version for traceability
The datum cache feature strips inline datums from UTxO sets to save memory, storing them in a separate cache. However, two StateChanged event handlers were missing the stripDatums call, causing inconsistency between localUTxO and datumCache: - SnapshotRequested: Was assigning newLocalUTxO directly without stripping - LocalStateCleared (ConfirmedSnapshot case): Was assigning snapshot.utxo directly without stripping Both handlers now: 1. Call stripDatums on the UTxO to extract inline datums 2. Merge the extracted datums with the existing datumCache 3. Store only the stripped UTxO in localUTxO This fixes the 'chain out of sync' runtime error that occurred after the datum cache memory optimization was implemented.
When processing ReqSn (snapshot requests), the confirmedUTxO from the confirmed snapshot has inline datums stripped (due to DatumCache optimization). Before applying transactions via ledger validation, we must restore the datums so that: 1. Script validation works correctly (scripts need inline datums) 2. The resulting UTxO hash matches what other parties compute 3. Subsequent SnapshotConfirmed events are emitted correctly This fixes an issue where only the first SnapshotConfirmed event was being emitted because transaction application was failing silently due to missing datums in the UTxO set passed to applyTransactions. The fix: - Added HasDatumCache constraint to onOpenNetworkReqSn - Extract datumCache from OpenState - Create restoredConfirmedUTxO using restoreDatums before passing to requireApplicableDecommitTx and subsequently requireApplyTxs
L2 transactions don't require L1 chain awareness, so they should be processed even when the node is temporarily behind on observing L1 blocks. Other L1-dependent operations (Init, Close, Contest, etc.) remain blocked. This fixes the 'chain out of sync' error that was rejecting all client inputs when the node was briefly behind on L1, even though L2 transactions operate independently of L1 state.
This reverts commit 51bd18c.
The handleSubmitL2Tx function was returning a plain JSON string for request parsing errors, which caused clients (like Tonic/Go) to fail parsing the response with 'cannot unmarshal string into Go value'. Now returns SubmitTxRejectedResponse object with proper 'tag' and 'reason' fields, consistent with other error responses. This prevents client-side parsing failures that led to transaction retries, which in turn caused BadInputsUTxO errors when the original transaction had already been confirmed in a snapshot.
Under high load with concurrent TX submissions, the HTTP handler for POST /transaction was matching ANY RejectedInput for NewTx, not checking if it was the specific transaction submitted. This caused false-negative responses where a successful TX was reported as rejected because another concurrent TX was rejected. Now the handler checks txId transaction == txid before returning SubmitTxRejected, preventing the race condition.
Adds a test that verifies the HTTP handler correctly ignores RejectedInput events for different transactions. This ensures that when TX_A is submitted and a RejectedInput for TX_B appears, the handler for TX_A continues waiting and correctly returns success when TX_A is confirmed.
Add --datum-hot-cache-size CLI option to control datum cache memory usage. This threads the configuration from RunOptions through Environment to HeadLogic, where pruneCacheWithLimit applies size-based eviction after each snapshot confirmation. - Add datumHotCacheSize field to RunOptions (default: 100) - Add datumHotCacheSize field to Environment - Add pruneCacheWithLimit function to DatumCache module - Update aggregate functions in HeadLogic to accept cache size config - Pass cache size through processNextInput and aggregateState - Update all test fixtures with datumHotCacheSize = 0 (unlimited) Behavior: - 0 = unlimited (UTxO-aligned pruning only) - N > 0 = evict oldest entries (by hash order) when cache exceeds N
- Fuse mapMaybe/map in DatumCache.hs (HLint warning) - Add missing datumHotCacheSize field in hydra-cluster HydraNode.hs
- Reorder import in HeadLogic.hs (move Numeric.Natural after Hydra.Tx.Snapshot) - Add datumHotCacheSize to Greetings/Greetings.json golden file - Add datumHotCacheSize property to Environment schema in api.yaml
- Increase persistent broadcast queue capacity from 100 to 1000 - Increase gRPC put message timeout from 3s to 10s These changes help prevent snapshot confirmation failures under high transaction load by allowing more messages to queue and giving more time for gRPC operations to complete.
- Add logCritical helper that always logs to stderr regardless of verbosity - Log QueueNearCapacity when broadcast queue reaches 80% capacity - Log ConsecutiveBroadcastFailures after 5+ consecutive failures - Track consecutive broadcast failures with counter reset on success - Add withCriticalTracer function for future use This helps diagnose snapshot confirmation issues under high load where one node may not be sending AckSn due to network problems.
…oss under high load Under high transaction load, snapshot signatures were being lost because: - All messages shared a single FIFO queue - ReqSn/AckSn protocol messages got buried behind ReqTx transactions - When AckSn arrived before local ReqSn was processed, it got re-enqueued to the back of the queue, causing signature collection to fail Solution: Dual-queue system that processes protocol messages before transactions - HighPriority: ReqSn, AckSn, ChainInput, ClientInput, ConnectivityEvent - LowPriority: ReqTx, ReqDec This ensures protocol state machine messages are never starved by transaction load.
- Remove unused ToJSON/FromJSON instances from MessagePriority - Remove unused withCriticalTracer function from Logging module - Remove unused Natural import from HeadLogic - Add type signature for local binding to satisfy -Wmissing-local-signatures
Previously, pruneCacheWithLimit could evict datums that are still referenced by the current UTxO set when the cache size exceeded datumHotCacheSize. This caused MissingRequiredDatums errors when validating transactions that consume UTxOs with inline datums. The fix removes the evictToLimit logic entirely because after pruneCache restricts the cache to only datums in the current UTxO set, all remaining datums are required for transaction validation. Evicting any of them would break the system. The datumHotCacheSize parameter is kept for API compatibility but now only serves as a monitoring hint - the actual cache size will be equal to the number of UTxOs with inline datums in the current state.
Summary
Reduces memory footprint of Hydra nodes by storing datum hashes instead of full inline datums in memory, with a separate cache to restore them when needed for on-chain transactions.
Problem
High memory usage in Hydra nodes caused by UTxOs with inline datums that remain unspent in the head. These datums consume memory even when not actively needed.
Solution
Implement a datum caching mechanism that:
DatumCache(Map from hash to datum)Key Changes
New Module:
Hydra.DatumCacheDatumCachetype - strict Map from datum hash to full datumHasDatumCachetype class for abstracting datum operationsstripDatums/restoreDatumsfunctions for UTxO manipulationDatum Stripping (reduces memory)
HeadOpened: strip datums from initialUTxOTransactionAppliedToLocalUTxO: strip after tx validationCommitFinalized: strip datums from deposited UTxODecommitRecorded: strip datums from remaining UTxODatum Restoration (preserves on-chain correctness)
HeadClosed: restore before storing in ClosedStateonOpenClientClose: restore before emitting CloseTxonOpenChainCloseTx: restore before emitting ContestTxCritical Bug Fixed
Discovered that
hashUTxOproduces different hashes for stripped datums vs full inline datums. This would cause on-chain validation failures because snapshots are signed with full datums. Solution: restore datums BEFORE emitting Close/Contest transactions.Testing
Files Changed
Hydra/DatumCache.hsHeadLogic.hs,HeadLogic/State.hs,HeadLogic/Outcome.hsNode.hs,Node/Run.hs,API/Server.hsLedger/Simple.hs,HeadLogicSpec.hs,NodeSpec.hs,RotationSpec.hsapi.yaml, golden files