Skip to content

feat(pglite/sync): Add commitGranularity parameter in the syncShapeToTable#456

Merged
samwillis merged 12 commits intomainfrom
samwillis/sync-commit-granularity
Dec 9, 2024
Merged

feat(pglite/sync): Add commitGranularity parameter in the syncShapeToTable#456
samwillis merged 12 commits intomainfrom
samwillis/sync-commit-granularity

Conversation

@samwillis
Copy link
Collaborator

@samwillis samwillis commented Dec 8, 2024

Adds three additional options to syncShapeToTable:

  • commitGranularity: CommitGranularity

    The granularity of the commit operation, defaults to "up-to-date". Note that a commit will always be performed immediately on the up-to-date message.
    Options:

    • "up-to-date": Commit all messages when the up-to-date message is received.
    • "transaction": Commit all messages within transactions as they were applied to the source Postgres.
      Note: Remove transaction commit granularity until Electric has stabilised on LSN metadata
    • "operation": Commit each message in its own transaction.
    • number: Commit every N messages.
  • commitThrottle: number

    The number of milliseconds to wait between commits, defaults to 0.

  • onInitialSync: () => void

    A callback that is called when the initial sync is complete.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2024

@samwillis samwillis force-pushed the samwillis/sync-commit-granularity branch from 4059bee to 7e94f13 Compare December 8, 2024 18:44
@samwillis samwillis force-pushed the samwillis/sync-commit-granularity branch from aa1124b to 8b2568f Compare December 8, 2024 19:33
@samwillis samwillis requested a review from msfstef December 9, 2024 06:42
Copy link
Collaborator

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the API! Left some comments on implementation

)
if (
options.commitThrottle &&
now - lastCommitAt < options.commitThrottle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that this throttler might not run the last received commit - perhaps add a utility with a more full fledged throttler or use a utility like lodash's? Ideally the last commit call should always execute after the throttle period and not have to wait for one more if it was not run due to throttling.

edit: I realise that the behaviour I describe would introduce complexity in awaiting things, handling errors, etc - perhaps something like having a unified throttle timer mechanism that checks if there are messages to commit at the end of every throttle timeout and inserts a "pseudo-batch" of messages to trigger processing might work - let me know if the "last messages not being committed" issue is not really an issue and I'm missing something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that as we will always end with an up-to-date in the protocol, and that forces an un-throttled commit, that we may not need to have a timeout like a "normal" throttling utility?

}
} else if (isControlMessage(message)) {
switch (message.headers.control) {
case 'must-refetch':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should onInitialSyncCalled be reset here? Or do we consider the rebuilding of data an internal thing and thus don't expose any "initial data" info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I consider it an internal thing.

The idea is that the initialSync callback is only ever called once, that way it can be used to turn on a write path, install triggers, etc. that may add overhead.

TBH, must-refetch adds a lot of pain for large datasets. Deleting everything and reinserting, will be slow, particularly if using Postgres triggers for conflict resolution (as I have in Linearlite). This needs more exploration.

@samwillis samwillis force-pushed the samwillis/sync-commit-granularity branch from a9143ea to 7417c21 Compare December 9, 2024 09:24
@samwillis samwillis merged commit 46c102c into main Dec 9, 2024
7 checks passed
@samwillis samwillis deleted the samwillis/sync-commit-granularity branch December 9, 2024 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants