-
Notifications
You must be signed in to change notification settings - Fork 0
Optimize IndexCount to terminate slaves early when limit exceeded #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Modified CIndexCountActivityMaster to detect IndexExists case (choosenLimit == 1) - Added aggregateToLimitIndexExists() that broadcasts stop signal when count > 0 - Added stopTag for master-slave communication - Modified CIndexCountSlaveActivity to check for stop signal during counting - Slaves can now terminate early when another slave finds a match Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
- Changed serializeSlaveData to always append stopTag - Changed slave init to always read stopTag - This ensures consistent serialization/deserialization Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
jakesmith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot - see comments.
| { | ||
| // Non-blocking check for stop signal | ||
| CMessageBuffer msg; | ||
| if (container.queryJobChannel().queryJobComm().recv(msg, 0, stopTag, nullptr, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot - this approach won't work. Even if this was non-blocking (which it isn't), it is too inefficient for the caller of checkStopSignal() to do this each call.
There probably needs to be an asynchronous thread that is listening for stop signals, that flags 'stopped' so the count routines can bail out when it sees the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to use a dedicated async CStopHandler thread instead of polling. The thread blocks on recv() waiting for stop signals and sets an atomic flag that the counting loop checks. This eliminates the inefficient per-call recv() overhead. See commit for details.
|
|
||
| rowcount_t total; | ||
| rowcount_t choosenLimit = helper->getChooseNLimit(); | ||
| if (choosenLimit == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot - In fact, this should not be the basis of stopping early.
In all cases, if the limit is exceeded, the master should signal to the workers that they can stop to save them wasting their time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to send stop signals whenever ANY choosenLimit is exceeded (not just choosenLimit==1). The master now checks total > choosenLimit in the overridden aggregateToLimit() method, making this optimization broadly applicable to CHOOSEN, IndexExists, and any other operation with a limit. See commit for details.
- Replaced polling approach with dedicated CStopHandler thread - Thread listens asynchronously for stop signals from master - Master now sends stop signals when ANY choosenLimit is exceeded (not just choosenLimit==1) - Uses atomic flag for thread-safe communication between handler and counting loop - Properly manages thread lifecycle in start/stop/abort methods Addresses feedback on PR review comments Co-authored-by: jakesmith <902700+jakesmith@users.noreply.github.com>
Problem
When executing IndexCount operations with limits (including IndexExists which translates to IndexCount with
choosenLimit=1), multiple slaves count in parallel across index partitions. If one slave reaches the limit quickly, the other slaves continue searching unnecessarily. This is particularly wasteful when:For example, with 10 slaves searching a large filtered index where slave 2 reaches the limit in 1 second, the other 9 slaves may continue spinning for many seconds checking their partitions before the operation completes.
Solution
Implemented an early termination mechanism using master-slave signaling that applies to any IndexCount operation with a choosenLimit:
Implementation Details
Master Side (
CIndexCountActivityMaster)stopTag(mptag_t) for stop signal communicationaggregateToLimit()to detect when limit exceeded:total > choosenLimit, broadcasts stop signal to all slavesSlave Side (
CIndexCountSlaveActivity)stoppedflag when stop signal receivedKey Design Decisions
CStopHandlerthread eliminates polling overheadstart(), stopped instop()/abort()Benefits
Example Scenarios
IndexExists (choosenLimit=1)
CHOOSEN with limit
Testing
Existing regression tests exercise this code path:
aggidx1.ecl: ContainsEXISTS(sq.SimplePersonBookIndex)indexlimit3.ecl: ContainsEXISTS(fi)with filtered index and CHOOSEN operationsFull integration testing requires HPCC cluster environment to observe the early termination behavior across multiple slaves.
Original prompt
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.