Conversation
…tegrationnet # Conflicts: # project/Dependencies.scala # src/main/scala/org/constellation/snapshotstreaming/Configuration.scala
…ge-downstream-changes
ryle-ai
left a comment
There was a problem hiding this comment.
Review: Merge downstream changes
The code changes in this PR (batch insert for delegated staking rewards, Set -> Seq for reward ordering, tessellation version bump to 3.4.0, SQL consolidation) all look correct and well-motivated. A few concerns about the non-code files being committed:
Large data files committed directly to git
mainnet-streaming-cfg/lastSnapshot.json is 3.5MB / 67,589 lines, and mainnet-streaming-cfg/lastIncrementalSnapshot.json.gz is 5.3MB. These are runtime snapshot data files being committed directly to the repository without Git LFS. Once merged, they permanently inflate the repo clone size for every contributor. These files will also likely need to be updated periodically as the network advances, compounding the bloat.
Consider either:
- Using Git LFS for these binary/large data files
- Adding them to
.gitignoreand documenting how to obtain them (e.g., from S3 or a bootstrap script) - Storing them in a separate artifact repository
reference.conf default paths changed
In src/main/resources/reference.conf, the defaults changed from:
lastSnapshotPath = "lastSnapshotTest.json"
lastIncrementalSnapshotPath = "lastIncrementalSnapshotTest1.json.gz"
to:
lastSnapshotPath = "lastSnapshot.json"
lastIncrementalSnapshotPath = "lastIncrementalSnapshot.json.gz"
reference.conf provides the fallback defaults when no application.conf override is provided. This is fine if all deployment environments use an explicit application.conf (the new mainnet-streaming-cfg/application.conf does override these). But if anyone runs the app without a config override (e.g., local dev), they'll now need files named lastSnapshot.json and lastIncrementalSnapshot.json.gz instead of the old test-suffixed names. Minor, but worth noting for developer experience.
Code changes look good
-
Batch insert for delegated staking rewards (
SnapshotDAO.scala): The switch from single-rowexecuteCmdto batchexecuteManyusinginsertDelegatedStakingRewardsManyfollows the existing pattern in the codebase (e.g.,insertAddressMany,insertMetagraphTransactionsMany). The chunking at 5000 rows viaexecuteManyand theON CONFLICT DO NOTHINGclause are both appropriate. -
Set[RewardTransaction]->Seq[RewardTransaction]inSnapshot.scalaandCurrencySnapshot.scala: This preserves reward ordering, which aligns with the "ensure ordering of rewards" commit. TheAddressExtractorcorrectly adds.toSetback where deduplication is needed for address extraction. -
SQL consolidation: The
delegated_staking.sqldefinitions have been properly merged intosnapshot.sql. The trigger functions (update_updated_at_column,insert_into_parent_abstract_blocks, etc.) are preserved in the new file. The formatting change to single-lineCREATE TABLEstatements reduces readability but is a style choice.
ryle-ai
left a comment
There was a problem hiding this comment.
ON CONFLICT clause doesn't cover the new primary key on delegate_stake_rewards
The new snapshot.sql schema adds a primary key (global_snapshot_hash, stake_create_hash) to delegate_stake_rewards, but the INSERT statement in insertDelegatedStakingRewardsMany uses ON CONFLICT (global_snapshot_hash, address, node_id, rewards) DO NOTHING — targeting the unique constraint, not the PK.
During normal first-time processing this is fine because each (snapshot, stake_create_hash) pair produces exactly one reward entry. But during reindexing (which this codebase supports via SnapshotProcessorS3), if a snapshot is reprocessed and the same (global_snapshot_hash, stake_create_hash) pair is inserted again, PostgreSQL will raise a PK violation error instead of silently skipping via DO NOTHING — because the ON CONFLICT clause only handles the unique constraint, not the PK.
Consider changing the ON CONFLICT to target the primary key:
ON CONFLICT (global_snapshot_hash, stake_create_hash) DO NOTHING| stake_create_hash | ||
| ) VALUES ($varchar, $varchar, $varchar, $int8, $varchar) | ||
| ) VALUES $enc | ||
| ON CONFLICT (global_snapshot_hash, address, node_id, rewards) DO NOTHING; |
There was a problem hiding this comment.
The ON CONFLICT (global_snapshot_hash, address, node_id, rewards) clause targets the unique constraint, but the new schema in snapshot.sql defines a primary key (global_snapshot_hash, stake_create_hash) on this table. If a snapshot is reprocessed during reindexing with the same stake_create_hash, PostgreSQL will throw a PK violation rather than silently doing nothing — the ON CONFLICT target doesn't cover the PK.
Consider ON CONFLICT (global_snapshot_hash, stake_create_hash) DO NOTHING to match the PK instead.
ryle-ai
left a comment
There was a problem hiding this comment.
reindexer config in mainnet-streaming-cfg/application.conf will crash the app at startup
The new application.conf defines:
reindexer = { dbChunks: 10, checkpointEvery: 2000 }
But the Reindexer case class in Configuration.scala:72 expects:
final case class Reindexer(s3Parallelism: Int, s3Prefetch: Int, snapshotContextPrefetch: Int, dbParallelism: Int)Because Typesafe Config uses full object replacement (not merge) when application.conf assigns reindexer = { ... }, the reference.conf defaults for s3Parallelism, s3Prefetch, etc. are completely overridden. The merged config will only contain dbChunks and checkpointEvery -- neither of which exists in the case class. PureConfig will fail with missing key errors for all four required fields.
Since reindexer is Option[Reindexer] in SnapshotStreamingConfig, PureConfig won't silently fall back to None -- the key exists in config, so it attempts to parse it as Reindexer and fails.
Either update the Reindexer case class to match these new field names, update the config to use the existing field names, or remove the reindexer line from application.conf entirely (so reference.conf defaults apply).
Note: checkpointEvery is already a separate top-level field in SnapshotStreamingConfig (not inside Reindexer), so nesting it inside reindexer {} puts it at the wrong config path anyway.
| lastIncrementalSnapshotPath = "mainnet-streaming-cfg/lastIncrementalSnapshot.json.gz" | ||
| collateral = 0 #25000000000000 | ||
| environment = mainnet | ||
| reindexer = { dbChunks: 10, checkpointEvery: 2000 } |
There was a problem hiding this comment.
This reindexer config uses fields (dbChunks, checkpointEvery) that don't exist in the Reindexer case class, which expects s3Parallelism, s3Prefetch, snapshotContextPrefetch, dbParallelism. Because HOCON = replaces the entire object (rather than merging with reference.conf), the four required fields will be missing and PureConfig will throw at startup.
Also, checkpointEvery is a top-level field on SnapshotStreamingConfig, not part of Reindexer -- nesting it here puts it at the wrong path.
No description provided.