Merge downstream changes by gclaramunt · Pull Request #113 · Constellation-Labs/snapshot-streaming

gclaramunt · 2025-09-04T01:40:10Z

No description provided.

…tegrationnet # Conflicts: # project/Dependencies.scala # src/main/scala/org/constellation/snapshotstreaming/Configuration.scala

…ge-downstream-changes

ryle-ai

Review: Merge downstream changes

The code changes in this PR (batch insert for delegated staking rewards, Set -> Seq for reward ordering, tessellation version bump to 3.4.0, SQL consolidation) all look correct and well-motivated. A few concerns about the non-code files being committed:

Large data files committed directly to git

mainnet-streaming-cfg/lastSnapshot.json is 3.5MB / 67,589 lines, and mainnet-streaming-cfg/lastIncrementalSnapshot.json.gz is 5.3MB. These are runtime snapshot data files being committed directly to the repository without Git LFS. Once merged, they permanently inflate the repo clone size for every contributor. These files will also likely need to be updated periodically as the network advances, compounding the bloat.

Consider either:

Using Git LFS for these binary/large data files
Adding them to .gitignore and documenting how to obtain them (e.g., from S3 or a bootstrap script)
Storing them in a separate artifact repository

`reference.conf` default paths changed

In src/main/resources/reference.conf, the defaults changed from:

lastSnapshotPath = "lastSnapshotTest.json"
lastIncrementalSnapshotPath = "lastIncrementalSnapshotTest1.json.gz"

to:

lastSnapshotPath = "lastSnapshot.json"
lastIncrementalSnapshotPath = "lastIncrementalSnapshot.json.gz"

reference.conf provides the fallback defaults when no application.conf override is provided. This is fine if all deployment environments use an explicit application.conf (the new mainnet-streaming-cfg/application.conf does override these). But if anyone runs the app without a config override (e.g., local dev), they'll now need files named lastSnapshot.json and lastIncrementalSnapshot.json.gz instead of the old test-suffixed names. Minor, but worth noting for developer experience.

Code changes look good

Batch insert for delegated staking rewards (SnapshotDAO.scala): The switch from single-row executeCmd to batch executeMany using insertDelegatedStakingRewardsMany follows the existing pattern in the codebase (e.g., insertAddressMany, insertMetagraphTransactionsMany). The chunking at 5000 rows via executeMany and the ON CONFLICT DO NOTHING clause are both appropriate.
Set[RewardTransaction] -> Seq[RewardTransaction] in Snapshot.scala and CurrencySnapshot.scala: This preserves reward ordering, which aligns with the "ensure ordering of rewards" commit. The AddressExtractor correctly adds .toSet back where deduplication is needed for address extraction.
SQL consolidation: The delegated_staking.sql definitions have been properly merged into snapshot.sql. The trigger functions (update_updated_at_column, insert_into_parent_abstract_blocks, etc.) are preserved in the new file. The formatting change to single-line CREATE TABLE statements reduces readability but is a style choice.

ryle-ai

`ON CONFLICT` clause doesn't cover the new primary key on `delegate_stake_rewards`

The new snapshot.sql schema adds a primary key (global_snapshot_hash, stake_create_hash) to delegate_stake_rewards, but the INSERT statement in insertDelegatedStakingRewardsMany uses ON CONFLICT (global_snapshot_hash, address, node_id, rewards) DO NOTHING — targeting the unique constraint, not the PK.

During normal first-time processing this is fine because each (snapshot, stake_create_hash) pair produces exactly one reward entry. But during reindexing (which this codebase supports via SnapshotProcessorS3), if a snapshot is reprocessed and the same (global_snapshot_hash, stake_create_hash) pair is inserted again, PostgreSQL will raise a PK violation error instead of silently skipping via DO NOTHING — because the ON CONFLICT clause only handles the unique constraint, not the PK.

Consider changing the ON CONFLICT to target the primary key:

ON CONFLICT (global_snapshot_hash, stake_create_hash) DO NOTHING

ryle-ai · 2026-02-12T06:03:14Z

src/main/scala/org/constellation/snapshotstreaming/db/SnapshotDAO.scala

        stake_create_hash
-      ) VALUES ($varchar, $varchar, $varchar, $int8, $varchar)
+      ) VALUES $enc
      ON CONFLICT (global_snapshot_hash, address, node_id, rewards) DO NOTHING;


The ON CONFLICT (global_snapshot_hash, address, node_id, rewards) clause targets the unique constraint, but the new schema in snapshot.sql defines a primary key (global_snapshot_hash, stake_create_hash) on this table. If a snapshot is reprocessed during reindexing with the same stake_create_hash, PostgreSQL will throw a PK violation rather than silently doing nothing — the ON CONFLICT target doesn't cover the PK.

Consider ON CONFLICT (global_snapshot_hash, stake_create_hash) DO NOTHING to match the PK instead.

ryle-ai

`reindexer` config in `mainnet-streaming-cfg/application.conf` will crash the app at startup

The new application.conf defines:

reindexer = { dbChunks: 10, checkpointEvery: 2000 }

But the Reindexer case class in Configuration.scala:72 expects:

final case class Reindexer(s3Parallelism: Int, s3Prefetch: Int, snapshotContextPrefetch: Int, dbParallelism: Int)

Because Typesafe Config uses full object replacement (not merge) when application.conf assigns reindexer = { ... }, the reference.conf defaults for s3Parallelism, s3Prefetch, etc. are completely overridden. The merged config will only contain dbChunks and checkpointEvery -- neither of which exists in the case class. PureConfig will fail with missing key errors for all four required fields.

Since reindexer is Option[Reindexer] in SnapshotStreamingConfig, PureConfig won't silently fall back to None -- the key exists in config, so it attempts to parse it as Reindexer and fails.

Either update the Reindexer case class to match these new field names, update the config to use the existing field names, or remove the reindexer line from application.conf entirely (so reference.conf defaults apply).

Note: checkpointEvery is already a separate top-level field in SnapshotStreamingConfig (not inside Reindexer), so nesting it inside reindexer {} puts it at the wrong config path anyway.

ryle-ai · 2026-02-12T21:20:10Z

mainnet-streaming-cfg/application.conf

+  lastIncrementalSnapshotPath = "mainnet-streaming-cfg/lastIncrementalSnapshot.json.gz"
+  collateral = 0 #25000000000000
+  environment = mainnet
+  reindexer =  { dbChunks: 10, checkpointEvery: 2000 }


This reindexer config uses fields (dbChunks, checkpointEvery) that don't exist in the Reindexer case class, which expects s3Parallelism, s3Prefetch, snapshotContextPrefetch, dbParallelism. Because HOCON = replaces the entire object (rather than merging with reference.conf), the four required fields will be missing and PureConfig will throw at startup.

Also, checkpointEvery is a top-level field on SnapshotStreamingConfig, not part of Reindexer -- nesting it here puts it at the wrong path.

marcus-girolneto and others added 24 commits June 25, 2025 11:01

chore: updating tessellation version on intnet streaming

4f30c2d

fix compilation after testnet rebase

da2ca55

chore: updating tessellation version on streaming (intnet)

f5d2759

chore: updating tessellation version on streaming (intnet)

0378026

testnet rebase

e08bc1a

store tx original as json

d89b95b

chore: updating tessellation version on intnet streaming

6502db6

chore: updating tessellation version on mainnet streaming

07ba65b

chore: updating tessellation version on intnet streaming

18bf0b9

chore: updating tessellation version on streaming (mainnet)

ca0a0c1

merge for release

ca95cfb

update readme with example

f4651ac

ensure ordering of rewards

7eec209

Merge remote-tracking branch 'origin/release/testnet' into release/in…

6a32291

…tegrationnet # Conflicts: # project/Dependencies.scala # src/main/scala/org/constellation/snapshotstreaming/Configuration.scala

chore: updating tessellation version on streaming (integrationnet)

f7d268c

Merge branch 'release/mainnet' into release/integrationnet

fde8212

read spend tx from snapshot instead of events (#112)

6bdd85a

chore: updating tessellation version on streaming (intnet)

71d612a

chore: updating tessellation version on streaming (intnet)

9fc2f62

chore: updating tessellation version on streaming (mainnet)

fcb4b4b

batch delegated staekes reward insert

1470c83

Merge branch 'release/integrationnet' into merge-downstream-changes

3c53cd4

Merge remote-tracking branch 'origin/release/integrationnet' into mer…

5b60cd3

…ge-downstream-changes

Merge branch 'release/mainnet' into merge-downstream-changes

9b996fc

gclaramunt requested a review from marcus-girolneto September 4, 2025 01:40

ryle-ai reviewed Feb 11, 2026

View reviewed changes

ryle-ai reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge downstream changes#113

Merge downstream changes#113
gclaramunt wants to merge 24 commits intodevelopfrom
merge-downstream-changes

gclaramunt commented Sep 4, 2025

Uh oh!

ryle-ai left a comment

Uh oh!

ryle-ai left a comment

Uh oh!

ryle-ai Feb 12, 2026

Uh oh!

ryle-ai left a comment

Uh oh!

ryle-ai Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gclaramunt commented Sep 4, 2025

Uh oh!

ryle-ai left a comment

Choose a reason for hiding this comment

Review: Merge downstream changes

Large data files committed directly to git

reference.conf default paths changed

Code changes look good

Uh oh!

ryle-ai left a comment

Choose a reason for hiding this comment

ON CONFLICT clause doesn't cover the new primary key on delegate_stake_rewards

Uh oh!

ryle-ai Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

ryle-ai left a comment

Choose a reason for hiding this comment

reindexer config in mainnet-streaming-cfg/application.conf will crash the app at startup

Uh oh!

ryle-ai Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`reference.conf` default paths changed

`ON CONFLICT` clause doesn't cover the new primary key on `delegate_stake_rewards`

`reindexer` config in `mainnet-streaming-cfg/application.conf` will crash the app at startup