feat: add DatumCache for memory optimization of inline datums by awcjack · Pull Request #18 · awcjack/hydra

awcjack · 2026-01-03T08:36:45Z

Summary

Reduces memory footprint of Hydra nodes by storing datum hashes instead of full inline datums in memory, with a separate cache to restore them when needed for on-chain transactions.

Problem

High memory usage in Hydra nodes caused by UTxOs with inline datums that remain unspent in the head. These datums consume memory even when not actively needed.

Solution

Implement a datum caching mechanism that:

Strips inline datums from UTxOs after they enter the head (stores only datum hash)
Caches the full datums in a separate DatumCache (Map from hash to datum)
Restores datums before on-chain transactions (Close/Contest) to preserve hash consistency

Key Changes

New Module: `Hydra.DatumCache`

DatumCache type - strict Map from datum hash to full datum
HasDatumCache type class for abstracting datum operations
stripDatums / restoreDatums functions for UTxO manipulation

Datum Stripping (reduces memory)

HeadOpened: strip datums from initialUTxO
TransactionAppliedToLocalUTxO: strip after tx validation
CommitFinalized: strip datums from deposited UTxO
DecommitRecorded: strip datums from remaining UTxO

Datum Restoration (preserves on-chain correctness)

HeadClosed: restore before storing in ClosedState
onOpenClientClose: restore before emitting CloseTx
onOpenChainCloseTx: restore before emitting ContestTx

Critical Bug Fixed

Discovered that hashUTxO produces different hashes for stripped datums vs full inline datums. This would cause on-chain validation failures because snapshots are signed with full datums. Solution: restore datums BEFORE emitting Close/Contest transactions.

Testing

✅ Library builds without warnings
✅ HeadLogic tests pass
✅ 543/547 tests pass (4 failures are unrelated network infrastructure tests)

Files Changed

Category	Files
New	`Hydra/DatumCache.hs`
Core Logic	`HeadLogic.hs`, `HeadLogic/State.hs`, `HeadLogic/Outcome.hs`
Node Layer	`Node.hs`, `Node/Run.hs`, `API/Server.hs`
Tests	`Ledger/Simple.hs`, `HeadLogicSpec.hs`, `NodeSpec.hs`, `RotationSpec.hs`
Schema	`api.yaml`, golden files

Introduce DatumCache module to reduce memory footprint of Hydra nodes by storing datum hashes instead of full inline datums in memory. - Add DatumCache type (strict Map from datum hash to full datum) - Add HasDatumCache type class for abstracting datum operations - Implement stripDatums/restoreDatums for UTxO manipulation - Export emptyCache, insertDatum, lookupDatum utilities

- Add datumCache field to OpenState for storing stripped datums - Add HasDatumCache constraint to StateChanged and Outcome types - Implement no-op HasDatumCache instance for SimpleTx test type

Strip inline datums from UTxOs to reduce memory footprint: - HeadOpened: strip datums from initialUTxO - TransactionAppliedToLocalUTxO: strip after tx validation - CommitFinalized: strip datums from deposited UTxO - DecommitRecorded: strip datums from remaining UTxO Restore datums before on-chain transactions to preserve hash consistency: - HeadClosed: restore datums before storing in ClosedState - onOpenClientClose: restore datums before emitting CloseTx - onOpenChainCloseTx: restore datums before emitting ContestTx This fixes a critical bug where stripped datums would produce different hashes than original inline datums, causing on-chain validation failures.

Add HasDatumCache constraint to: - HydraNode type in Node.hs - runHydraNode and related functions in Node/Run.hs - Server functions in API/Server.hs

- Add datumCache = emptyCache to OpenState in HeadLogicSpec - Add HasDatumCache constraint in NodeSpec and RotationSpec

- Add DatumCache schema to api.yaml - Update golden test files to include datumCache field in OpenState

github-actions · 2026-01-03T08:46:53Z

Transaction cost differences

No cost or size differences found

github-actions · 2026-01-03T08:54:48Z

Transaction costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at	2026-01-05 15:22:37.88171796 UTC
Max. memory units	14000000
Max. CPU units	10000000000
Max. tx size (kB)	16384

Script summary

Name	Hash	Size (Bytes)
νInitial	c8a101a5c8ac4816b0dceb59ce31fc2258e387de828f02961d2f2045	2652
νCommit	61458bc2f297fff3cc5df6ac7ab57cefd87763b0b7bd722146a1035c	685
νHead	a1442faf26d4ec409e2f62a685c1d4893f8d6bcbaf7bcb59d6fa1340	14599
μHead	fd173b993e12103cd734ca6710d364e17120a5eb37a224c64ab2b188*	5284
νDeposit	ae01dade3a9c346d5c93ae3ce339412b90a0b8f83f94ec6baa24e30c	1102

The minting policy hash is only usable for comparison. As the script is parameterized, the actual script is unique per head.

`Init` transaction costs

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	5836	10.64	3.38	0.52
2	6038	12.34	3.90	0.54
3	6239	14.72	4.66	0.58
5	6640	18.41	5.80	0.63
10	7646	29.00	9.14	0.79
43	14281	98.97	30.93	1.80

`Commit` transaction costs

This uses ada-only outputs for better comparability.

UTxO	Tx size	% max Mem	% max CPU	Min fee ₳
1	561	2.44	1.16	0.20
2	743	3.38	1.73	0.22
3	920	4.36	2.33	0.24
5	1280	6.41	3.60	0.28
10	2176	12.13	7.25	0.40
54	10068	98.61	68.52	1.88

`CollectCom` transaction costs

Parties	UTxO (bytes)	Tx size	% max Mem	% max CPU	Min fee ₳
1	57	525	24.46	7.13	0.42
2	114	636	33.18	9.60	0.52
3	171	747	43.73	12.51	0.63
4	224	858	50.85	14.62	0.70
5	282	969	64.32	18.24	0.84
6	338	1081	65.18	18.95	0.86
7	395	1192	74.44	21.41	0.96
8	449	1303	87.15	24.94	1.09

Cost of Increment Transaction

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	1785	24.29	7.69	0.48
2	1954	25.85	8.78	0.51
3	2068	27.40	9.88	0.53
5	2391	31.30	12.31	0.60
10	3175	41.04	18.38	0.75
40	7667	96.38	53.75	1.65

Cost of Decrement Transaction

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	645	22.50	7.30	0.41
2	768	24.35	8.48	0.44
3	829	24.09	9.03	0.45
5	1268	30.15	12.07	0.54
10	2230	42.44	18.85	0.73
38	6082	93.84	51.77	1.55

`Close` transaction costs

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	673	27.47	8.46	0.46
2	863	29.90	9.82	0.50
3	944	30.94	10.75	0.52
5	1249	35.04	13.25	0.58
10	2067	45.13	19.42	0.75
37	5891	95.71	51.56	1.55

`Contest` transaction costs

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	666	33.83	10.16	0.53
2	825	35.85	11.38	0.56
3	979	38.59	12.82	0.60
5	1273	42.64	15.28	0.66
10	2127	55.97	22.38	0.86
28	4673	95.60	45.33	1.46

`Abort` transaction costs

There is some variation due to the random mixture of initial and already committed outputs.

Parties	Tx size	% max Mem	% max CPU	Min fee ₳
1	5795	27.13	9.11	0.69
2	5918	35.80	12.04	0.79
3	6107	44.89	15.06	0.89
4	6191	51.09	17.19	0.96
5	6604	66.70	22.65	1.14
6	6569	72.80	24.57	1.20
7	6689	79.54	26.73	1.28
8	6771	88.22	29.60	1.37

`FanOut` transaction costs

Involves spending head output and burning head tokens. Uses ada-only UTXO for better comparability.

Parties	UTxO	UTxO (bytes)	Tx size	% max Mem	% max CPU	Min fee ₳
10	0	0	5834	18.30	6.11	0.60
10	1	57	5869	21.41	7.28	0.63
10	10	569	6173	38.18	14.00	0.83
10	20	1138	6512	58.66	22.07	1.07
10	30	1704	6851	80.92	30.76	1.33
10	40	2277	7193	99.84	38.30	1.55
10	39	2221	7161	99.12	37.95	1.54

End-to-end benchmark results

This page is intended to collect the latest end-to-end benchmark results produced by Hydra's continuous integration (CI) system from the latest master code.

Please note that these results are approximate as they are currently produced from limited cloud VMs and not controlled hardware. Rather than focusing on the absolute results, the emphasis should be on relative results, such as how the timings for a scenario evolve as the code changes.

Generated at 2026-01-05 15:25:36.626686903 UTC

Baseline Scenario

Number of nodes	1
Number of txs	300
Avg. Confirmation Time (ms)	5.510206623
P99	9.466960909999965ms
P95	6.93772895ms
P50	5.2393695000000005ms
Number of Invalid txs	0

Three local nodes

Number of nodes	3
Number of txs	900
Avg. Confirmation Time (ms)	32.543640974
P99	50.21502475999999ms
P95	42.639221299999996ms
P50	31.453317ms
Number of Invalid txs	0

- Remove unused Monoid instance for DatumCache - Remove unused functions: insertDatum, deleteDatum, cacheSize - Remove unused extractDatumsFromUTxO and extractInlineDatum helpers - Clean up redundant imports in Simple.hs and Node.hs

GitHub Actions runners have limited disk space (~14GB available). When building uncached Nix derivations (like our modified hydra-node), the build can exhaust disk space during compilation. This adds a cleanup step that removes unused tools before the build: - .NET SDK (~1.8GB) - Android SDK (~9GB) - GHC (~5GB) - CodeQL (~2.5GB) - Unused Docker images This frees up ~20GB of disk space, ensuring builds complete successfully.

- Add pull_request trigger for PRs targeting master branch - Tag PR builds as pr-<number> for easy identification - Use PR head SHA as version for traceability

The datum cache feature strips inline datums from UTxO sets to save memory, storing them in a separate cache. However, two StateChanged event handlers were missing the stripDatums call, causing inconsistency between localUTxO and datumCache: - SnapshotRequested: Was assigning newLocalUTxO directly without stripping - LocalStateCleared (ConfirmedSnapshot case): Was assigning snapshot.utxo directly without stripping Both handlers now: 1. Call stripDatums on the UTxO to extract inline datums 2. Merge the extracted datums with the existing datumCache 3. Store only the stripped UTxO in localUTxO This fixes the 'chain out of sync' runtime error that occurred after the datum cache memory optimization was implemented.

When processing ReqSn (snapshot requests), the confirmedUTxO from the confirmed snapshot has inline datums stripped (due to DatumCache optimization). Before applying transactions via ledger validation, we must restore the datums so that: 1. Script validation works correctly (scripts need inline datums) 2. The resulting UTxO hash matches what other parties compute 3. Subsequent SnapshotConfirmed events are emitted correctly This fixes an issue where only the first SnapshotConfirmed event was being emitted because transaction application was failing silently due to missing datums in the UTxO set passed to applyTransactions. The fix: - Added HasDatumCache constraint to onOpenNetworkReqSn - Extract datumCache from OpenState - Create restoredConfirmedUTxO using restoreDatums before passing to requireApplicableDecommitTx and subsequently requireApplyTxs

L2 transactions don't require L1 chain awareness, so they should be processed even when the node is temporarily behind on observing L1 blocks. Other L1-dependent operations (Init, Close, Contest, etc.) remain blocked. This fixes the 'chain out of sync' error that was rejecting all client inputs when the node was briefly behind on L1, even though L2 transactions operate independently of L1 state.

This reverts commit 51bd18c.

The handleSubmitL2Tx function was returning a plain JSON string for request parsing errors, which caused clients (like Tonic/Go) to fail parsing the response with 'cannot unmarshal string into Go value'. Now returns SubmitTxRejectedResponse object with proper 'tag' and 'reason' fields, consistent with other error responses. This prevents client-side parsing failures that led to transaction retries, which in turn caused BadInputsUTxO errors when the original transaction had already been confirmed in a snapshot.

Under high load with concurrent TX submissions, the HTTP handler for POST /transaction was matching ANY RejectedInput for NewTx, not checking if it was the specific transaction submitted. This caused false-negative responses where a successful TX was reported as rejected because another concurrent TX was rejected. Now the handler checks txId transaction == txid before returning SubmitTxRejected, preventing the race condition.

Adds a test that verifies the HTTP handler correctly ignores RejectedInput events for different transactions. This ensures that when TX_A is submitted and a RejectedInput for TX_B appears, the handler for TX_A continues waiting and correctly returns success when TX_A is confirmed.

Add --datum-hot-cache-size CLI option to control datum cache memory usage. This threads the configuration from RunOptions through Environment to HeadLogic, where pruneCacheWithLimit applies size-based eviction after each snapshot confirmation. - Add datumHotCacheSize field to RunOptions (default: 100) - Add datumHotCacheSize field to Environment - Add pruneCacheWithLimit function to DatumCache module - Update aggregate functions in HeadLogic to accept cache size config - Pass cache size through processNextInput and aggregateState - Update all test fixtures with datumHotCacheSize = 0 (unlimited) Behavior: - 0 = unlimited (UTxO-aligned pruning only) - N > 0 = evict oldest entries (by hash order) when cache exceeds N

- Fuse mapMaybe/map in DatumCache.hs (HLint warning) - Add missing datumHotCacheSize field in hydra-cluster HydraNode.hs

- Reorder import in HeadLogic.hs (move Numeric.Natural after Hydra.Tx.Snapshot) - Add datumHotCacheSize to Greetings/Greetings.json golden file - Add datumHotCacheSize property to Environment schema in api.yaml

- Increase persistent broadcast queue capacity from 100 to 1000 - Increase gRPC put message timeout from 3s to 10s These changes help prevent snapshot confirmation failures under high transaction load by allowing more messages to queue and giving more time for gRPC operations to complete.

- Add logCritical helper that always logs to stderr regardless of verbosity - Log QueueNearCapacity when broadcast queue reaches 80% capacity - Log ConsecutiveBroadcastFailures after 5+ consecutive failures - Track consecutive broadcast failures with counter reset on success - Add withCriticalTracer function for future use This helps diagnose snapshot confirmation issues under high load where one node may not be sending AckSn due to network problems.

…oss under high load Under high transaction load, snapshot signatures were being lost because: - All messages shared a single FIFO queue - ReqSn/AckSn protocol messages got buried behind ReqTx transactions - When AckSn arrived before local ReqSn was processed, it got re-enqueued to the back of the queue, causing signature collection to fail Solution: Dual-queue system that processes protocol messages before transactions - HighPriority: ReqSn, AckSn, ChainInput, ClientInput, ConnectivityEvent - LowPriority: ReqTx, ReqDec This ensures protocol state machine messages are never starved by transaction load.

- Remove unused ToJSON/FromJSON instances from MessagePriority - Remove unused withCriticalTracer function from Logging module - Remove unused Natural import from HeadLogic - Add type signature for local binding to satisfy -Wmissing-local-signatures

Previously, pruneCacheWithLimit could evict datums that are still referenced by the current UTxO set when the cache size exceeded datumHotCacheSize. This caused MissingRequiredDatums errors when validating transactions that consume UTxOs with inline datums. The fix removes the evictToLimit logic entirely because after pruneCache restricts the cache to only datums in the current UTxO set, all remaining datums are required for transaction validation. Evicting any of them would break the system. The datumHotCacheSize parameter is kept for API compatibility but now only serves as a monitoring hint - the actual cache size will be equal to the number of UTxOs with inline datums in the current state.

awcjack added 7 commits January 3, 2026 15:39

feat: add datumCache field to OpenState and HasDatumCache instances

95dfb61

- Add datumCache field to OpenState for storing stripped datums - Add HasDatumCache constraint to StateChanged and Outcome types - Implement no-op HasDatumCache instance for SimpleTx test type

feat: propagate HasDatumCache constraint to Node layer

6c8cc46

Add HasDatumCache constraint to: - HydraNode type in Node.hs - runHydraNode and related functions in Node/Run.hs - Server functions in API/Server.hs

test: update tests for datumCache field

413372b

- Add datumCache = emptyCache to OpenState in HeadLogicSpec - Add HasDatumCache constraint in NodeSpec and RotationSpec

chore: update API schema and golden files for datumCache

b749969

- Add DatumCache schema to api.yaml - Update golden test files to include datumCache field in OpenState

fix: use newtype instead of data for DatumCache (hlint)

7ff8fea

style: fix formatting per treefmt (import order, comment placement)

be3879b

awcjack added 20 commits January 3, 2026 17:29

ci: enable Docker builds for pull requests

9d66287

- Add pull_request trigger for PRs targeting master branch - Tag PR builds as pr-<number> for easy identification - Use PR head SHA as version for traceability

Revert "fix: allow NewTx when node is catching up on chain sync"

2022a24

This reverts commit 51bd18c.

style: fix formatting in HTTPServerSpec test

91e89ab

fix: address CI failures for datumHotCacheSize

3cec1ce

- Fuse mapMaybe/map in DatumCache.hs (HLint warning) - Add missing datumHotCacheSize field in hydra-cluster HydraNode.hs

fix: address remaining CI failures for datumHotCacheSize

21b42b4

- Reorder import in HeadLogic.hs (move Numeric.Natural after Hydra.Tx.Snapshot) - Add datumHotCacheSize to Greetings/Greetings.json golden file - Add datumHotCacheSize property to Environment schema in api.yaml

style: fix treefmt formatting for critical logging changes

70acb5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DatumCache for memory optimization of inline datums#18

feat: add DatumCache for memory optimization of inline datums#18
awcjack wants to merge 28 commits intomasterfrom
feature/datum-cache-memory-optimization

awcjack commented Jan 3, 2026

Uh oh!

github-actions bot commented Jan 3, 2026

Uh oh!

github-actions bot commented Jan 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

awcjack commented Jan 3, 2026

Summary

Problem

Solution

Key Changes

New Module: Hydra.DatumCache

Datum Stripping (reduces memory)

Datum Restoration (preserves on-chain correctness)

Critical Bug Fixed

Testing

Files Changed

Uh oh!

github-actions bot commented Jan 3, 2026

Transaction cost differences

Uh oh!

github-actions bot commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Transaction costs

Script summary

Init transaction costs

Commit transaction costs

CollectCom transaction costs

Cost of Increment Transaction

Cost of Decrement Transaction

Close transaction costs

Contest transaction costs

Abort transaction costs

FanOut transaction costs

End-to-end benchmark results

Baseline Scenario

Three local nodes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New Module: `Hydra.DatumCache`

github-actions bot commented Jan 3, 2026 •

edited

Loading

`Init` transaction costs

`Commit` transaction costs

`CollectCom` transaction costs

`Close` transaction costs

`Contest` transaction costs

`Abort` transaction costs

`FanOut` transaction costs