-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: TableMaintenance Support Coordinator Lock #15151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| * Event sent from TriggerManager operator to TableMaintenanceCoordinator to confirm that the lock | ||
| * has been acquired and the maintenance task is starting. | ||
| */ | ||
| public class LockAcquireResultEvent implements OperatorEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this anymore
| lockReleasedEvent.lockId(), | ||
| lockReleasedEvent.timestamp()); | ||
| } else { | ||
| LOG.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle concurrency on restart?
We might get a release event even before the register event, and we would like to process them when the register even arrives
|
|
||
| @VisibleForTesting | ||
| void handleLockReleaseResult(LockReleasedEvent event) { | ||
| if (event.lockId().equals(tableName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check, or the filtering could be handled in the coordinator?
| void handleLockReleaseResult(LockReleasedEvent event) { | ||
| if (event.lockId().equals(tableName)) { | ||
| this.lockHeld = false; | ||
| this.restoreTasks = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be correct on recovery.
If we have an ongoing task and the recovery lock, then this might be only the "task" lock release.
Let's imagine this sequence:
- Restore with ongoing T1. We send a Trigger for restore.
restoreTasksis set totrue. - T1 lock release arrives.
restoreTasksis set tofalse. - We trigger T2, and set
lockHeldtotrue - Restore Trigger arrives to the LockRemover. We receive a
LockReleasedEvent, and setlockHeldtofalse. - We trigger T3 - which is wrong
We need to differentiate between the restoreLock release and the lockHeld release
| operatorEventGateway.sendEventToCoordinator( | ||
| new LockReleasedEvent(tableName, streamRecord.getTimestamp())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the LockReleasedEvent is unnecessary
|
|
||
| @Override | ||
| public void processWatermark(Watermark mark) throws Exception { | ||
| operatorEventGateway.sendEventToCoordinator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the LockReleasedEvent is unnecessary
This PR is a continuation of #15042.
In this approach, the lock is maintained inside
TriggerManagerOperator, andTableMaintenanceCoordinatoracts as a bridge. It holds a reference toTriggerManagerOperatorin a static map, and uses that to receive events fromLockRemoverOperatorand release the lock held byTriggerManagerOperator.Normal flow
TriggerManagerOperatordecides whether to fire a trigger and, if so, acquires the lock internally and sends the trigger downstream.LockRemoverOperator.LockRemoverOperatorsends an event toTableMaintenanceCoordinator.TableMaintenanceCoordinatoruses the reference stored in its static map to find the correspondingTriggerManagerOperatorand sends a “release lock” event to it.TriggerManagerOperatorreleases the lock.Recovery flow
initializeState,TriggerManagerOperatormarks itself as “in recovery” and then sends a recovery trigger downstream. While the “in recovery” flag is true, other tasks are not allowed to operate on it.LockRemoverOperator, it sends an event toTableMaintenanceCoordinatorto release the lock.TriggerManagerOperatorreleases the lock.