Skip to content

Conversation

@dreamlike-ocean
Copy link
Contributor

No description provided.

@franz1981
Copy link
Owner

I have to use some of the Shipilev loom builds to run the CI.
Building JDK on It will likely kill it

@dreamlike-ocean dreamlike-ocean marked this pull request as draft October 12, 2025 16:24
@dreamlike-ocean
Copy link
Contributor Author

Sorry, I seem to have accidentally pushed a draft earlier.
It should be more complete now — I’ve replaced all the internal Chinese comments.

To be frank, I don’t really like the design of GlobalDelegateThreadNettyScheduler, because some parts of it only exist to make Thread::startVirtualThread inherit the parent scheduler, and the ConcurrentHashMap there is bound to become a bottleneck.
Also, it’s quite hard for me to distinguish whether a virtual thread belongs to a poller.
long vtCount = globalDelegateThreadNettyScheduler.internalSchedulerMappings
.keySet().stream().map(Thread::getName)
.filter(s -> !s.contains("Poller"))
.count();

@franz1981
Copy link
Owner

I will take a look in the next two days (thanks for the PR!) but I already love the idea; it is much in line with what I wanted to implement to reintroduce the inheritance while keeping the single scheduler per event loop abstraction in place, without too many lookups to emulate the missing attachment API.

I have in my to-do list to implement work stealing, which will clash a bit with this model, but I still have no proof to be so worthy, yet.

Related this PR, instead: I have a custom scheduler branch but at this point I would make master to be the one seeking the very latest loom changes.
I see little point to have a JDK 21 integration without the nice poller per carrier mode implemented by @AlanBateman

WDYT?

@franz1981
Copy link
Owner

franz1981 commented Oct 12, 2025

Answers in line

To be frank, I don’t really like the design of GlobalDelegateThreadNettyScheduler

I know and agree with you: tbh having both inheritance and attachment back would have make all of this more "natural" without external data structures.
But there is a nice JMH module and if we can build a benchmark which prove this to be a bottleneck we have something to give feedback related these choices IMO

Also, it’s quite hard for me to distinguish whether a virtual thread belongs to a poller.

Agree on this as well: having exposed the type of vThread in an enum while the submit happens would help, or been aware if is pinned or whatever hint..

@dreamlike-ocean
Copy link
Contributor Author

Added a benchmark comparing GlobalDelegateThreadNettyScheduler with the default behavior of inheriting the scheduler from the parent thread.

My hardware:
13th Gen Intel(R) Core(TM) i5-13600KF

Linux dreamlike-MS-7E01 6.14.0-33-generic #33-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 17 23:22:02 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

loom build from openjdk/loom@50be4eb

Benchmark                                                      (tasks)   Mode  Cnt  Score   Error   Units
GlobalDelegateThreadNettySchedulerBenchmark.global                1000  thrpt   10  4.833 ± 0.088  ops/ms
GlobalDelegateThreadNettySchedulerBenchmark.global              100000  thrpt   10  0.049 ± 0.003  ops/ms
GlobalDelegateThreadNettySchedulerBenchmark.inheritFromParent     1000  thrpt   10  5.854 ± 0.133  ops/ms
GlobalDelegateThreadNettySchedulerBenchmark.inheritFromParent   100000  thrpt   10  0.061 ± 0.002  ops/ms

@dreamlike-ocean dreamlike-ocean marked this pull request as ready for review October 13, 2025 17:29
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 2, jvmArgs = {"--add-opens=java.base/java.lang=ALL-UNNAMED", "-XX:+UnlockExperimentalVMOptions"})
public class GetScheduler {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add few lines to describe the purpose of this benchmark? 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to compare the performance difference between the new Thread.VirtualThreadScheduler.current() API and directly accessing the scheduler.
The new API appears to be slightly more complex, so I’d like to see whether it introduces any performance overhead.

vtFactory.newThread(
() -> {
for (int i = 0; i < tasks; i++) {
Thread.startVirtualThread(countDown::countDown);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread.startVirtualThread should:

  1. using determineScheduler and VirtualThreadNettyScheduler ::current (querying the scoped value)
  2. updating the CHM in the global scheduler
  3. allocates a lambda to handle removing the vthread once completed
  4. submit the lambda to the right VirtualThreadNettyScheduler

Which to me seems that the main relevant aspects of difference from what we got are:

  • allocation rate (can be measured with -prof gc)
  • number of atomic operations (the CHM::get but most importantly CHM::put)

Which make me think that maybe we should have a special mpsc queue in VirtualThreadNettyScheduler which is drained round-robin with the existing one and it always automatically check for thread state and remove the vthread from the CHM if completed.
This will save allocating the tiny wrapper lambda around the vthread continuation and the likely additional type (the lambda has a different type from the Continuation).
Clearly it means that VirtualThreadNettyScheduler should:

  • have a new executeFromGlobal (better named, I'm bad w names!) to submit vs this new mpsc queue
  • a reference to the global executor instance to remove the v thread continuation once completed

This can be done later too, eh: just brainstorming in what the existing behaviour could differ.

@Fork(value = 2, jvmArgs = {"--add-opens=java.base/java.lang=ALL-UNNAMED", "-XX:+UnlockExperimentalVMOptions",
"-XX:-DoJVMTIVirtualThreadTransitions", "-Djdk.trackAllThreads=false", "-Djdk.virtualThreadScheduler.implClass=io.netty.loom.GlobalDelegateThreadNettyScheduler"})
@State(Scope.Thread)
public class GlobalDelegateThreadNettySchedulerBenchmark {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another interesting benchmark could be to measure the difference in cost for Virtual Thread which are NOT managed by Netty instead, when the GlobalDelegateThreadNettyScheduler is used/not used.

@franz1981 franz1981 added the enhancement New feature or request label Oct 14, 2025
@franz1981
Copy link
Owner

I'll give this a shot on my machine as well @dreamlike-ocean 🙏
Will take few days, but this is already a very good job, and I plan to merge it

@franz1981
Copy link
Owner

franz1981 commented Oct 15, 2025

@dreamlike-ocean I will soon send a PR to your branch with some changes (if you agree) so we can proceed on this: I'm currently adding more fine grain tests to shutdown/verify pollers based on poller mode etc etc

@dreamlike-ocean
Copy link
Contributor Author

@dreamlike-ocean I will soon send a PR to your branch with some changes (if you agree) so we can proceed on this: I'm currently adding more fine grain tests to shutdown/verify pollers based on poller mode etc etc

You can submit code directly to my branch — there should be no protection on it. I’d be very happy to collaborate with you to improve this work.

@franz1981
Copy link
Owner

@dreamlike-ocean I added very few comments and split tests, really 🙏
Check if they looks ok and I'll still band my head on this a bit more; regardless we're ready to go

I'll add as a follow up:

  • some additional benchmarks
  • eventually optimizations (if I can find any relevant)
  • test coverage for coexistence of "alien" custom schedulers - to make sure they well-behave (they should looking at the code)

@franz1981
Copy link
Owner

franz1981 commented Oct 15, 2025

@dreamlike-ocean
I'm currently playing with this idea...

   @Override
   public void execute(Thread vthread, Runnable task) {
      // we can have 3 types of executions:
      // 1. vthreads which belong to some Netty scheduler
      // 2. vthreads which belong to the JDK built-in scheduler
      // 3. vthreads which belong to some unknown scheduler
      // We want to preserve the inheritance of the scheduler just for vthreads spawned from Netty vthreads.
      // For all other vthreads, we simply use the JDK built-in scheduler
      // It means we are not forced to track ALL vthreads, including the ones running on the JDK built-in scheduler
      // or from unknown schedulers.
      // We can just fail to recognize them and use perform an informed decision at scheduling time.
      VirtualThreadNettyScheduler mappedScheduler = inheritedNettyVthreads.get(vthread);
      if (mappedScheduler != null) {
         mappedScheduler.execute(vthread, () -> {
            try {
               task.run();
            } finally {
               if (vthread.getState() == Thread.State.TERMINATED) {
                  inheritedNettyVthreads.remove(vthread);
               }
            }
         });
         return;
      }
      Thread.VirtualThreadScheduler scheduler = determineScheduler();
      if (scheduler == jdkBuildinScheduler) {
         // we don't track vthreads running on the JDK built-in scheduler
         scheduler.execute(vthread, task);
      } else if (scheduler instanceof VirtualThreadNettyScheduler nettyScheduler) {
         inheritedNettyVthreads.put(vthread, nettyScheduler);
         nettyScheduler.execute(vthread, () -> {
            try {
               task.run();
            } finally {
               if (vthread.getState() == Thread.State.TERMINATED) {
                  inheritedNettyVthreads.remove(vthread);
               }
            }
         });
      } else {
         scheduler.execute(vthread, task);
      }
   }

Which, if will work, enable me to find a slightly different way to perform the mapping for the inherited-only v threads (maybe using just primitive values), but as usual, maybe I'm doing some terrible logic mistake and I'm taking some time to think if is just wrong

I see some io.netty.loom.MultithreadVirtualEventExecutorGroupTest#testVirtualThreadSpawnsVirtualThreads() stack overflow errors on this (meaning you already tried?)

and can be fixed by

    private Thread.VirtualThreadScheduler determineScheduler() {
        Thread callerThread = Thread.currentThread();
        // platform thread
        if (!callerThread.isVirtual()) {
            return jdkBuildinScheduler;
        }
        VirtualThreadNettyScheduler current = VirtualThreadNettyScheduler.current();
        // The current thread was spawned from a specific VirtualThreadNettyScheduler,
        // so we continue using that scheduler.
        if (current != null) {
            return current;
        }
        Thread.VirtualThreadScheduler parentScheduler = inheritedNettyVthreads.get(callerThread);
        if (parentScheduler != null) {
            return parentScheduler;
        }

        // The current thread was spawned from an unknown scheduler that is not managed by GlobalDelegateThreadNettyScheduler,
        // so we directly use the parent’s scheduler instead to avoid potential stack overflow.
        var currentScheduler = Thread.VirtualThreadScheduler.current();
        if (currentScheduler == this) {
            // if the caller thread is not known we assume it won't benefit from inheriting, so its children
            // will use the JDK built-in scheduler
            return jdkBuildinScheduler;
        }
        return currentScheduler;
    }
    ```

@franz1981
Copy link
Owner

To be precise on determine scheduler at #27 (comment)
we don't need to query Thread.VirtualThreadScheduler.current() but just return the builtin one: if a virtual thread is not tracked nor it belong to anything managed by the global scheduler, we could just enforce the default policy I.e. don't make it inherit the parent scheduler. WDYT?

@franz1981
Copy link
Owner

I am still not convinced by my proposal at #27 (comment) and I would like to write a test which:

  • create a virtual thread from the test harness thread (which is a normal platform thread) waiting on a countdown latch (it should not be assigned to run on the netty scheduler)
  • from the netty virtual thread (or an inherited child of it), increment the countdown latch, unblocking the awaiting one
  • verify that the unblocked v thread doesn't run on the netty scheduler

This test should stress the case of submitting a vThread to the global scheduler from a virtual thread which is assigned to the Netty scheduler.
According to the code I have modified, this test should fail. I will check tomorrow.

@franz1981
Copy link
Owner

franz1981 commented Oct 16, 2025

The concerns re my proposed changes at #27 (comment) are valid: I'm adding this test and merging this.
Any improvement (if possible) will be made as a follow-up.
The "issue" behind the approach I've proposed is that if the scheduling is caller-sensitive and a v thread is unparked from a "known" Netty scheduler, will assume it has to enforce "scheduler inheritance" whilst not required: the sole valid state where we need to consider scheduling inheritance is when a vThread is starting, but AFAIK this information is not available.

This last point make me think that if the existing VirtualThreadScheduler API would be

public void execute(Thread vThread, SchedulingReason reason, Runnable task)

with SchedulingReason

STARTING, UNPARK, TIMED_UNPARK, ... // which mimic the expected previous Thread::state?

the approach I've suggested could work because:

  1. we perform the decision to inherit scheduling only if reason == STARTING which HAS to happen for ALL v threads
  2. if reason != STARTING if is a known Netty-scheduled v thread will run it there, and if not will just run it on the built-in scheduler (because as said in build GlobalDelegateThreadNettyScheduler #27 (comment) we won't allow alien non-Netty custom schedulers to benefit of scheduler inheritance)

Thanks to this, we could reduce the surface of tracked v threads in the global scheduler, which is already performed by the default tracking ThreadContainer within Hotspot and optimize the queries to the previous scheduling decisions, because we can modify them only on STARTING.

And furthermore, the global scheduling code can be much lighter too e.g.

   @Override
   public void execute(Thread vthread, SchedulingReason reason, Runnable task) {
      if (reason == STARTING) {      
           // we could early stop here to run on the built-in if the caller thread is not virtual 
           VirtualThreadNettyScheduler nettyScheduler = VirtualThreadNettyScheduler.current()
           // who start vThread is bound to run on a specific Netty scheduler?
           if (nettyScheduler == null) {
               // who start vThread has inherited to run on a specific Netty scheduler?
               nettyScheduler = inheritedNettyVthreads.get(Thread.currentThread());       
           } else {
               // remember this vThread once it's rescheduled 
               inheritedNettyVthreads.put(vThread, nettyScheduler);
           }
           if (nettyScheduler != null) {
                   nettyScheduler.execute(vthread, reason, () -> {
                      try {
                         task.run();
                      } finally {
                         if (vthread.getState() == Thread.State.TERMINATED) {
                            inheritedNettyVthreads.remove(vthread);
                         }
                      }
                   });
                  return;
           }                     
           jdkBuildinScheduler.execute(vthread, reason, task);   
           return;   
      }       
      // is not the first time we saw it OR we're not interested into it
      VirtualThreadNettyScheduler mappedScheduler = inheritedNettyVthreads.get(vthread);
      if (mappedScheduler != null) {
         mappedScheduler.execute(vthread, () -> {
            try {
               task.run();
            } finally {
               if (vthread.getState() == Thread.State.TERMINATED) {
                  inheritedNettyVthreads.remove(vthread);
               }
            }
         });
         return;
      }
      jdkBuildinScheduler.execute(vthread, reason, task);
   }

NOTE (for myself of the future):
The approach suggested at #27 (comment) cannot be fixed, without the mentioned VirtualThreadScheduler API change even if we check if STATE != TERMINATED while executing them built-in scheduler, because this case can happen:

  1. on Thread.OfVirtual::start the vThread is scheduled on the built-in scheduler
  2. once the built-in carrier execute it and it completes the scheduling with Thread::getState() != TERMINATED we are ready to add the vThread to a CHM of builtInRunnableVThreads e.g. if the vThread is parked/descheduled
  3. the same vThread is unparked by a Netty vThread before builtInRunnableVThreads is modified
  4. the vThread is scheduled wrong :"(

@franz1981
Copy link
Owner

franz1981 commented Oct 16, 2025

@dreamlike-ocean I've added with 2d1a9fb few tests including some for the per carrier (sub)poller

which seems to fail if I run the tests with

-Djdk.virtualThreadScheduler.implClass=io.netty.loom.GlobalDelegateThreadNettyScheduler 
-Djdk.traceVirtualThreadLocals=false 
--add-opens=java.base/java.lang=ALL-UNNAMED 
-Djdk.pollerMode=3

It looks like there's a problem on terminating the per-carrier read sub-poller in testPerCarrierSubpollersInheritingNettyScheduler:

java.util.concurrent.RejectedExecutionException: event loop is shutting down
	at io.netty.loom.VirtualThreadNettyScheduler.execute(VirtualThreadNettyScheduler.java:190)
	at io.netty.loom.GlobalDelegateThreadNettyScheduler.execute(GlobalDelegateThreadNettyScheduler.java:40)
	at java.base/java.lang.VirtualThread.submitRunContinuation(VirtualThread.java:373)
	at java.base/java.lang.VirtualThread.submitRunContinuation(VirtualThread.java:397)
	at java.base/java.lang.VirtualThread.unpark(VirtualThread.java:894)
	at java.base/java.lang.System$1.unparkVirtualThread(System.java:2304)
	at java.base/java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:181)
	at java.base/sun.nio.ch.Poller.polled(Poller.java:218)
	at java.base/sun.nio.ch.EPollPoller.poll(EPollPoller.java:148)
	at java.base/sun.nio.ch.Poller.pollerLoop(Poller.java:306)
	at java.base/java.lang.Thread.run(Thread.java:1584)
	at java.base/jdk.internal.misc.InnocuousThread.run(InnocuousThread.java:148)

org.opentest4j.AssertionFailedError: 
Expected :true
Actual   :false
<Click to see difference>


	at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
	at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
	at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
	at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
	at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
	at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
	at io.netty.loom.MultithreadVirtualEventExecutorGroupTest.assertContainsJustBuiltinPollers(MultithreadVirtualEventExecutorGroupTest.java:580)
	at io.netty.loom.MultithreadVirtualEventExecutorGroupTest.testPerCarrierSubpollersOnNettyScheduler(MultithreadVirtualEventExecutorGroupTest.java:521)
	at io.netty.loom.MultithreadVirtualEventExecutorGroupTest.testPerCarrierSubpollersInheritingNettyScheduler(MultithreadVirtualEventExecutorGroupTest.java:458)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1604)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1604)

which means that the per-carrier inherited subpoller bound to the event loop carrier is not able to complete/run
and removed by the v thread managed/mapped by the global scheduler.
Even removing the check to prevent rejecting it, won't fix it: there's something related the termination of VirtualThreadNettyScheduler that need to be fixed.
This seems another form of the problem reported by #16 .
The same problem won't happen with pollerMode != 3.

I can merge this if you agree on the changes or we can try fixing the failure(s) before merging, let me know what you prefer @dreamlike-ocean !

@dreamlike-ocean
Copy link
Contributor Author

please merge it!thx

@dreamlike-ocean
Copy link
Contributor Author

image I checked the related code, and I believe that when pollerMode = 3, although this exception occurs, it will not cause a resource leak. When our eventLoop shuts down, the thread will also exit.

@franz1981
Copy link
Owner

franz1981 commented Oct 16, 2025

Yep @dreamlike-ocean the leak Is not due to what hotspot does, but the existing terminating logic of the virtual thread Netty scheduler is not correct.
As said, this is not a regression introduced by your changes but was likely present before

@franz1981 franz1981 merged commit 248119f into franz1981:master Oct 16, 2025
@franz1981
Copy link
Owner

Thanks @dreamlike-ocean 🙏🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants