Skip to content

Merge downstream changes#113

Open
gclaramunt wants to merge 24 commits intodevelopfrom
merge-downstream-changes
Open

Merge downstream changes#113
gclaramunt wants to merge 24 commits intodevelopfrom
merge-downstream-changes

Conversation

@gclaramunt
Copy link
Contributor

No description provided.

marcus-girolneto and others added 24 commits June 25, 2025 11:01
…tegrationnet

# Conflicts:
#	project/Dependencies.scala
#	src/main/scala/org/constellation/snapshotstreaming/Configuration.scala
Copy link

@ryle-ai ryle-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Merge downstream changes

The code changes in this PR (batch insert for delegated staking rewards, Set -> Seq for reward ordering, tessellation version bump to 3.4.0, SQL consolidation) all look correct and well-motivated. A few concerns about the non-code files being committed:

Large data files committed directly to git

mainnet-streaming-cfg/lastSnapshot.json is 3.5MB / 67,589 lines, and mainnet-streaming-cfg/lastIncrementalSnapshot.json.gz is 5.3MB. These are runtime snapshot data files being committed directly to the repository without Git LFS. Once merged, they permanently inflate the repo clone size for every contributor. These files will also likely need to be updated periodically as the network advances, compounding the bloat.

Consider either:

  • Using Git LFS for these binary/large data files
  • Adding them to .gitignore and documenting how to obtain them (e.g., from S3 or a bootstrap script)
  • Storing them in a separate artifact repository

reference.conf default paths changed

In src/main/resources/reference.conf, the defaults changed from:

lastSnapshotPath = "lastSnapshotTest.json"
lastIncrementalSnapshotPath = "lastIncrementalSnapshotTest1.json.gz"

to:

lastSnapshotPath = "lastSnapshot.json"
lastIncrementalSnapshotPath = "lastIncrementalSnapshot.json.gz"

reference.conf provides the fallback defaults when no application.conf override is provided. This is fine if all deployment environments use an explicit application.conf (the new mainnet-streaming-cfg/application.conf does override these). But if anyone runs the app without a config override (e.g., local dev), they'll now need files named lastSnapshot.json and lastIncrementalSnapshot.json.gz instead of the old test-suffixed names. Minor, but worth noting for developer experience.

Code changes look good

  • Batch insert for delegated staking rewards (SnapshotDAO.scala): The switch from single-row executeCmd to batch executeMany using insertDelegatedStakingRewardsMany follows the existing pattern in the codebase (e.g., insertAddressMany, insertMetagraphTransactionsMany). The chunking at 5000 rows via executeMany and the ON CONFLICT DO NOTHING clause are both appropriate.

  • Set[RewardTransaction] -> Seq[RewardTransaction] in Snapshot.scala and CurrencySnapshot.scala: This preserves reward ordering, which aligns with the "ensure ordering of rewards" commit. The AddressExtractor correctly adds .toSet back where deduplication is needed for address extraction.

  • SQL consolidation: The delegated_staking.sql definitions have been properly merged into snapshot.sql. The trigger functions (update_updated_at_column, insert_into_parent_abstract_blocks, etc.) are preserved in the new file. The formatting change to single-line CREATE TABLE statements reduces readability but is a style choice.

Copy link

@ryle-ai ryle-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ON CONFLICT clause doesn't cover the new primary key on delegate_stake_rewards

The new snapshot.sql schema adds a primary key (global_snapshot_hash, stake_create_hash) to delegate_stake_rewards, but the INSERT statement in insertDelegatedStakingRewardsMany uses ON CONFLICT (global_snapshot_hash, address, node_id, rewards) DO NOTHING — targeting the unique constraint, not the PK.

During normal first-time processing this is fine because each (snapshot, stake_create_hash) pair produces exactly one reward entry. But during reindexing (which this codebase supports via SnapshotProcessorS3), if a snapshot is reprocessed and the same (global_snapshot_hash, stake_create_hash) pair is inserted again, PostgreSQL will raise a PK violation error instead of silently skipping via DO NOTHING — because the ON CONFLICT clause only handles the unique constraint, not the PK.

Consider changing the ON CONFLICT to target the primary key:

ON CONFLICT (global_snapshot_hash, stake_create_hash) DO NOTHING

stake_create_hash
) VALUES ($varchar, $varchar, $varchar, $int8, $varchar)
) VALUES $enc
ON CONFLICT (global_snapshot_hash, address, node_id, rewards) DO NOTHING;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ON CONFLICT (global_snapshot_hash, address, node_id, rewards) clause targets the unique constraint, but the new schema in snapshot.sql defines a primary key (global_snapshot_hash, stake_create_hash) on this table. If a snapshot is reprocessed during reindexing with the same stake_create_hash, PostgreSQL will throw a PK violation rather than silently doing nothing — the ON CONFLICT target doesn't cover the PK.

Consider ON CONFLICT (global_snapshot_hash, stake_create_hash) DO NOTHING to match the PK instead.

Copy link

@ryle-ai ryle-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reindexer config in mainnet-streaming-cfg/application.conf will crash the app at startup

The new application.conf defines:

reindexer = { dbChunks: 10, checkpointEvery: 2000 }

But the Reindexer case class in Configuration.scala:72 expects:

final case class Reindexer(s3Parallelism: Int, s3Prefetch: Int, snapshotContextPrefetch: Int, dbParallelism: Int)

Because Typesafe Config uses full object replacement (not merge) when application.conf assigns reindexer = { ... }, the reference.conf defaults for s3Parallelism, s3Prefetch, etc. are completely overridden. The merged config will only contain dbChunks and checkpointEvery -- neither of which exists in the case class. PureConfig will fail with missing key errors for all four required fields.

Since reindexer is Option[Reindexer] in SnapshotStreamingConfig, PureConfig won't silently fall back to None -- the key exists in config, so it attempts to parse it as Reindexer and fails.

Either update the Reindexer case class to match these new field names, update the config to use the existing field names, or remove the reindexer line from application.conf entirely (so reference.conf defaults apply).

Note: checkpointEvery is already a separate top-level field in SnapshotStreamingConfig (not inside Reindexer), so nesting it inside reindexer {} puts it at the wrong config path anyway.

lastIncrementalSnapshotPath = "mainnet-streaming-cfg/lastIncrementalSnapshot.json.gz"
collateral = 0 #25000000000000
environment = mainnet
reindexer = { dbChunks: 10, checkpointEvery: 2000 }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reindexer config uses fields (dbChunks, checkpointEvery) that don't exist in the Reindexer case class, which expects s3Parallelism, s3Prefetch, snapshotContextPrefetch, dbParallelism. Because HOCON = replaces the entire object (rather than merging with reference.conf), the four required fields will be missing and PureConfig will throw at startup.

Also, checkpointEvery is a top-level field on SnapshotStreamingConfig, not part of Reindexer -- nesting it here puts it at the wrong path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants