Skip to content

Conversation

@Guosmilesmile
Copy link
Contributor

This PR is a continuation of #15042.

In this approach, the lock is maintained inside TriggerManagerOperator, and TableMaintenanceCoordinator acts as a bridge. It holds a reference to TriggerManagerOperator in a static map, and uses that to receive events from LockRemoverOperator and release the lock held by TriggerManagerOperator.

Normal flow

  1. TriggerManagerOperator decides whether to fire a trigger and, if so, acquires the lock internally and sends the trigger downstream.
  2. After the task is finished, the trigger or the watermark reaches LockRemoverOperator.
  3. LockRemoverOperator sends an event to TableMaintenanceCoordinator. TableMaintenanceCoordinator uses the reference stored in its static map to find the corresponding TriggerManagerOperator and sends a “release lock” event to it.
  4. TriggerManagerOperator releases the lock.

Recovery flow

  1. During initializeState, TriggerManagerOperator marks itself as “in recovery” and then sends a recovery trigger downstream. While the “in recovery” flag is true, other tasks are not allowed to operate on it.
  2. When the recovery trigger or the watermark reaches LockRemoverOperator, it sends an event to TableMaintenanceCoordinator to release the lock.
  3. TriggerManagerOperator releases the lock.

@github-actions github-actions bot added the flink label Jan 27, 2026
* Event sent from TriggerManager operator to TableMaintenanceCoordinator to confirm that the lock
* has been acquired and the maintenance task is starting.
*/
public class LockAcquireResultEvent implements OperatorEvent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this anymore

lockReleasedEvent.lockId(),
lockReleasedEvent.timestamp());
} else {
LOG.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle concurrency on restart?
We might get a release event even before the register event, and we would like to process them when the register even arrives


@VisibleForTesting
void handleLockReleaseResult(LockReleasedEvent event) {
if (event.lockId().equals(tableName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check, or the filtering could be handled in the coordinator?

void handleLockReleaseResult(LockReleasedEvent event) {
if (event.lockId().equals(tableName)) {
this.lockHeld = false;
this.restoreTasks = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be correct on recovery.
If we have an ongoing task and the recovery lock, then this might be only the "task" lock release.
Let's imagine this sequence:

  • Restore with ongoing T1. We send a Trigger for restore. restoreTasks is set to true.
  • T1 lock release arrives. restoreTasks is set to false.
  • We trigger T2, and set lockHeld to true
  • Restore Trigger arrives to the LockRemover. We receive a LockReleasedEvent, and set lockHeld to false.
  • We trigger T3 - which is wrong

We need to differentiate between the restoreLock release and the lockHeld release

Comment on lines +100 to +101
operatorEventGateway.sendEventToCoordinator(
new LockReleasedEvent(tableName, streamRecord.getTimestamp()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the LockReleasedEvent is unnecessary


@Override
public void processWatermark(Watermark mark) throws Exception {
operatorEventGateway.sendEventToCoordinator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the LockReleasedEvent is unnecessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants