-
Notifications
You must be signed in to change notification settings - Fork 665
[node-core-library] Add allowOversubscription option #5355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[node-core-library] Add allowOversubscription option #5355
Conversation
|
@microsoft-github-policy-service agree company="DoorDash" |
D4N14L
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, though would like @dmichon-msft to take a look here.
common/changes/@rushstack/node-core-library/eb-concurrency-bug-fix_2025-09-11-15-24.json
Outdated
Show resolved
Hide resolved
|
For clarity, this isn't a bugfix, it's a behavior change. Exceeding the concurrency was the original intended design for how to handle large tasks. Waiting for sufficient capacity to take the entire job results in the CPUs spending more time idling and in general is expected to slow down overall completion. |
|
I think the safest way to address the competing priorities would be to add an extra option |
|
@dmichon-msft I'd like to get clarity on the intended behavior for the concurrency parameter to check that I'm understanding this library's expected behavior correctly. The JSDoc for concurrency states it should "limit the maximum number of concurrent promises to the specified number." My change enforces this as a strict limit, but I wanted to understand if the previous behavior that allowed tasks to exceed this limit was intentional (and the docs need updating) or if it was a bug. Here's the different options we have:
If we want to go ahead with #3, I'd suggest using a boolean
I think a boolean keeps the behavior deterministic and simpler for developers to reason about, I think setting the overage amount is difficult to reason about. I agree that changing this behavior could affect build times of existing repos, let me know the best way to land this in a safe way while exposing this isolation logic to project maintainers. Please let me know which approach we'd like to go forward with and I can update this PR. |
|
I can work with a boolean control, seems simple enough. Oversubscription was supported in the original design to deal with the scenario of "what happens if you specify a max concurrency of less than the largest operation weight", but arguably that gets handled by clipping the weights to the max concurrency. The other consideration is that if you have 16 cores, are running a long operation that takes 1, and have a queued operation that can use 16 cores (but in practice uses up to that, whatever it can get), then having to wait for that long running operation is wasteful. Arguably the best way to handle heavy jobs is probably to tune your configured operation weights to take better advantage of the hardware (or better yet, to shard that expensive operation so that it can be scheduled more easily). |
36c1370 to
18960cd
Compare
|
I apologize for the miscommunication; I think |
|
@dmichon-msft Chiming in here a little late (I've been sick the past few days), hopefully adding a little more context on how we ended up here. We've been investigating a whole slew of unit test flakes recently that are pretty easy to track down to "this test phase ran with our other expensive unit test phase" or "this test phase ran with our expensive NextJS app build phase". In those cases the test phase is weight 1 and the expensive phases are already weight 8. (the unit tests we're running have already been sharded and the NextJS app build can't be :( ) To address the flakes, we have a few options:
|
|
Thanks for the feedback, I've added the I'd appreciate another review on this, whenever this is in a state close to approval it would be great to have a preview release so I can test on my teams repo. |
common/changes/@microsoft/rush/eb-concurrency-bug-fix_2025-09-16-19-06.json
Outdated
Show resolved
Hide resolved
common/changes/@microsoft/rush/eb-concurrency-bug-fix_2025-09-16-19-06.json
Outdated
Show resolved
Hide resolved
common/changes/@rushstack/node-core-library/eb-concurrency-bug-fix_2025-09-11-15-24.json
Outdated
Show resolved
Hide resolved
|
@iclanton is this PR good to merge? I can test via preview in our repo if you'd like more testing before this goes in. |
|
@dmichon-msft @iclanton @D4N14L Hey Team, this PR seems to be a solid fix for the test flakes and performance issues we're experiencing when several expensive projects run on the same build agent. Could you provide a timeline for a merge or a dev preview so we can test it out? Thanks for your patience with my pings, we're just really excited to get this resolved. 😃 |
| * If true (default), will start operations even when they would exceed the limit. | ||
| * If false, waits until sufficient capacity is available. | ||
| */ | ||
| allowOversubscription?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/**
* Controls whether operations can start even if doing so would exceed the total concurrency limit.
* If true (default), will start operations even when they would exceed the limit.
* If false, waits until sufficient capacity is available.
*/
allowOversubscription?: boolean;
Rush Stack's convention is that optional booleans always default to false.
The allowOversubscription=false behavior seems like a more natural/intuitive operation, so maybe we should make false the default?
Although that's technically a "breaking" change, a bit less parallelism in an edge case is unlikely to break anyone's existing code. In fact, it's arguably a bugfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comments and changed the default to false for Async but not for Rush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanburrelldd Do you think we should change the default for Rush as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default for Async to false makes sense, but I'm worried about changing the default for Rush. It might slow things down for repo maintainers who update their version. Since the previous behavior allowed for oversubscription, keeping true as the default for Rush seems like the right move to avoid breaking things for other maintainers.
…lse; improve docs
|
🚀 @ethanburrelldd Let us know how if it solves your problem. |
|
@ethanburrelldd - Did that release work for you? |
|
Thanks for generating the pre-release! We're seeing decreased parallelism alongside tasks that have Here's a snapshot comparing build plans before / after the preview release. |
|
Great, thanks for following up! |
Summary
Adds
allowOversubscriptionproperty to control whether weighted operations can exceed concurrency limits.Details
Previously, weighted operations could be started in a case where the sum of all running operations would exceed the concurrency limit.
This PR adds an
allowOversubscriptionproperty that allows for more control over currency behavior while maintaining backward compatible with the current behavior. Backward compatibility is maintained since the current implementation lets the CPU sit idle for less cycles.weightconcurrency is availableIn my company's repo, we're running into some resource issues where many small projects get queued with a large project that should be running alone. Now, for this expensive project we will use
allowOversubscription: false, weight: 8, where 8 is the parallelism set when runningrush build -p 8, this should allow the expensive project to run with a higher level of isolation. In these cases the expensive operation will eat up CPU resources causing the smaller operations to timeout.Example
With
maxConcurrency = 8andconcurrentUnitsInProgress = 4after the last task exitedconcurrentUnitsInProgress <= 3Changes:
allowOversubscriptionoption to command-line.schema.json (defaults to true)CommandLineJson.tsclass to support this optionallowOversubscriptionthrough the operation lifecycle_forEachWeightedAsyncto handle the cases when this is setHow it was tested
Can I please get a pre-release so that I can test this version on our repo?
rush-redis-cobuild-plugin-integration-testImpacted documentation