-
Notifications
You must be signed in to change notification settings - Fork 6
build GlobalDelegateThreadNettyScheduler #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build GlobalDelegateThreadNettyScheduler #27
Conversation
|
I have to use some of the Shipilev loom builds to run the CI. |
04ebef3 to
837fc84
Compare
|
Sorry, I seem to have accidentally pushed a draft earlier. To be frank, I don’t really like the design of GlobalDelegateThreadNettyScheduler, because some parts of it only exist to make Thread::startVirtualThread inherit the parent scheduler, and the ConcurrentHashMap there is bound to become a bottleneck. |
|
I will take a look in the next two days (thanks for the PR!) but I already love the idea; it is much in line with what I wanted to implement to reintroduce the inheritance while keeping the single scheduler per event loop abstraction in place, without too many lookups to emulate the missing attachment API. I have in my to-do list to implement work stealing, which will clash a bit with this model, but I still have no proof to be so worthy, yet. Related this PR, instead: I have a custom scheduler branch but at this point I would make master to be the one seeking the very latest loom changes. WDYT? |
|
Answers in line
I know and agree with you: tbh having both inheritance and attachment back would have make all of this more "natural" without external data structures.
Agree on this as well: having exposed the type of vThread in an enum while the submit happens would help, or been aware if is pinned or whatever hint.. |
core/src/main/java/io/netty/loom/GlobalDelegateThreadNettyScheduler.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/netty/loom/GlobalDelegateThreadNettyScheduler.java
Outdated
Show resolved
Hide resolved
…he default behavior of inheriting the scheduler from the parent thread.
…-VirtualThread-Scheduler into loom_new_scheduler
|
Added a benchmark comparing GlobalDelegateThreadNettyScheduler with the default behavior of inheriting the scheduler from the parent thread. My hardware: Linux dreamlike-MS-7E01 6.14.0-33-generic #33-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 17 23:22:02 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux loom build from openjdk/loom@50be4eb |
| @OutputTimeUnit(TimeUnit.NANOSECONDS) | ||
| @BenchmarkMode(Mode.AverageTime) | ||
| @Fork(value = 2, jvmArgs = {"--add-opens=java.base/java.lang=ALL-UNNAMED", "-XX:+UnlockExperimentalVMOptions"}) | ||
| public class GetScheduler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add few lines to describe the purpose of this benchmark? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to compare the performance difference between the new Thread.VirtualThreadScheduler.current() API and directly accessing the scheduler.
The new API appears to be slightly more complex, so I’d like to see whether it introduces any performance overhead.
| vtFactory.newThread( | ||
| () -> { | ||
| for (int i = 0; i < tasks; i++) { | ||
| Thread.startVirtualThread(countDown::countDown); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread.startVirtualThread should:
- using
determineSchedulerandVirtualThreadNettyScheduler ::current(querying the scoped value) - updating the CHM in the global scheduler
- allocates a lambda to handle removing the vthread once completed
- submit the lambda to the right
VirtualThreadNettyScheduler
Which to me seems that the main relevant aspects of difference from what we got are:
- allocation rate (can be measured with
-prof gc) - number of atomic operations (the CHM::get but most importantly CHM::put)
Which make me think that maybe we should have a special mpsc queue in VirtualThreadNettyScheduler which is drained round-robin with the existing one and it always automatically check for thread state and remove the vthread from the CHM if completed.
This will save allocating the tiny wrapper lambda around the vthread continuation and the likely additional type (the lambda has a different type from the Continuation).
Clearly it means that VirtualThreadNettyScheduler should:
- have a new
executeFromGlobal(better named, I'm bad w names!) to submit vs this new mpsc queue - a reference to the global executor instance to remove the v thread continuation once completed
This can be done later too, eh: just brainstorming in what the existing behaviour could differ.
| @Fork(value = 2, jvmArgs = {"--add-opens=java.base/java.lang=ALL-UNNAMED", "-XX:+UnlockExperimentalVMOptions", | ||
| "-XX:-DoJVMTIVirtualThreadTransitions", "-Djdk.trackAllThreads=false", "-Djdk.virtualThreadScheduler.implClass=io.netty.loom.GlobalDelegateThreadNettyScheduler"}) | ||
| @State(Scope.Thread) | ||
| public class GlobalDelegateThreadNettySchedulerBenchmark { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another interesting benchmark could be to measure the difference in cost for Virtual Thread which are NOT managed by Netty instead, when the GlobalDelegateThreadNettyScheduler is used/not used.
|
I'll give this a shot on my machine as well @dreamlike-ocean 🙏 |
|
@dreamlike-ocean I will soon send a PR to your branch with some changes (if you agree) so we can proceed on this: I'm currently adding more fine grain tests to shutdown/verify pollers based on poller mode etc etc |
You can submit code directly to my branch — there should be no protection on it. I’d be very happy to collaborate with you to improve this work. |
|
@dreamlike-ocean I added very few comments and split tests, really 🙏 I'll add as a follow up:
|
|
@dreamlike-ocean @Override
public void execute(Thread vthread, Runnable task) {
// we can have 3 types of executions:
// 1. vthreads which belong to some Netty scheduler
// 2. vthreads which belong to the JDK built-in scheduler
// 3. vthreads which belong to some unknown scheduler
// We want to preserve the inheritance of the scheduler just for vthreads spawned from Netty vthreads.
// For all other vthreads, we simply use the JDK built-in scheduler
// It means we are not forced to track ALL vthreads, including the ones running on the JDK built-in scheduler
// or from unknown schedulers.
// We can just fail to recognize them and use perform an informed decision at scheduling time.
VirtualThreadNettyScheduler mappedScheduler = inheritedNettyVthreads.get(vthread);
if (mappedScheduler != null) {
mappedScheduler.execute(vthread, () -> {
try {
task.run();
} finally {
if (vthread.getState() == Thread.State.TERMINATED) {
inheritedNettyVthreads.remove(vthread);
}
}
});
return;
}
Thread.VirtualThreadScheduler scheduler = determineScheduler();
if (scheduler == jdkBuildinScheduler) {
// we don't track vthreads running on the JDK built-in scheduler
scheduler.execute(vthread, task);
} else if (scheduler instanceof VirtualThreadNettyScheduler nettyScheduler) {
inheritedNettyVthreads.put(vthread, nettyScheduler);
nettyScheduler.execute(vthread, () -> {
try {
task.run();
} finally {
if (vthread.getState() == Thread.State.TERMINATED) {
inheritedNettyVthreads.remove(vthread);
}
}
});
} else {
scheduler.execute(vthread, task);
}
}Which, if will work, enable me to find a slightly different way to perform the mapping for the inherited-only v threads (maybe using just primitive values), but as usual, maybe I'm doing some terrible logic mistake and I'm taking some time to think if is just wrong I see some and can be fixed by private Thread.VirtualThreadScheduler determineScheduler() {
Thread callerThread = Thread.currentThread();
// platform thread
if (!callerThread.isVirtual()) {
return jdkBuildinScheduler;
}
VirtualThreadNettyScheduler current = VirtualThreadNettyScheduler.current();
// The current thread was spawned from a specific VirtualThreadNettyScheduler,
// so we continue using that scheduler.
if (current != null) {
return current;
}
Thread.VirtualThreadScheduler parentScheduler = inheritedNettyVthreads.get(callerThread);
if (parentScheduler != null) {
return parentScheduler;
}
// The current thread was spawned from an unknown scheduler that is not managed by GlobalDelegateThreadNettyScheduler,
// so we directly use the parent’s scheduler instead to avoid potential stack overflow.
var currentScheduler = Thread.VirtualThreadScheduler.current();
if (currentScheduler == this) {
// if the caller thread is not known we assume it won't benefit from inheriting, so its children
// will use the JDK built-in scheduler
return jdkBuildinScheduler;
}
return currentScheduler;
}
``` |
|
To be precise on determine scheduler at #27 (comment) |
|
I am still not convinced by my proposal at #27 (comment) and I would like to write a test which:
This test should stress the case of submitting a vThread to the global scheduler from a virtual thread which is assigned to the Netty scheduler. |
|
The concerns re my proposed changes at #27 (comment) are valid: I'm adding this test and merging this. This last point make me think that if the existing public void execute(Thread vThread, SchedulingReason reason, Runnable task)with the approach I've suggested could work because:
Thanks to this, we could reduce the surface of tracked v threads in the global scheduler, which is already performed by the default tracking And furthermore, the global scheduling code can be much lighter too e.g. @Override
public void execute(Thread vthread, SchedulingReason reason, Runnable task) {
if (reason == STARTING) {
// we could early stop here to run on the built-in if the caller thread is not virtual
VirtualThreadNettyScheduler nettyScheduler = VirtualThreadNettyScheduler.current()
// who start vThread is bound to run on a specific Netty scheduler?
if (nettyScheduler == null) {
// who start vThread has inherited to run on a specific Netty scheduler?
nettyScheduler = inheritedNettyVthreads.get(Thread.currentThread());
} else {
// remember this vThread once it's rescheduled
inheritedNettyVthreads.put(vThread, nettyScheduler);
}
if (nettyScheduler != null) {
nettyScheduler.execute(vthread, reason, () -> {
try {
task.run();
} finally {
if (vthread.getState() == Thread.State.TERMINATED) {
inheritedNettyVthreads.remove(vthread);
}
}
});
return;
}
jdkBuildinScheduler.execute(vthread, reason, task);
return;
}
// is not the first time we saw it OR we're not interested into it
VirtualThreadNettyScheduler mappedScheduler = inheritedNettyVthreads.get(vthread);
if (mappedScheduler != null) {
mappedScheduler.execute(vthread, () -> {
try {
task.run();
} finally {
if (vthread.getState() == Thread.State.TERMINATED) {
inheritedNettyVthreads.remove(vthread);
}
}
});
return;
}
jdkBuildinScheduler.execute(vthread, reason, task);
}
|
|
@dreamlike-ocean I've added with 2d1a9fb few tests including some for the per carrier (sub)poller which seems to fail if I run the tests with It looks like there's a problem on terminating the per-carrier read sub-poller in which means that the per-carrier inherited subpoller bound to the event loop carrier is not able to complete/run I can merge this if you agree on the changes or we can try fixing the failure(s) before merging, let me know what you prefer @dreamlike-ocean ! |
|
please merge it!thx |
|
Yep @dreamlike-ocean the leak Is not due to what hotspot does, but the existing terminating logic of the virtual thread Netty scheduler is not correct. |
|
Thanks @dreamlike-ocean 🙏🙏 |

No description provided.