Casync clean by adespawn · Pull Request #414 · scylladb/nodejs-rs-driver

adespawn · 2026-03-13T17:04:04Z

This commit adds a custom async bridge between Rust and JavaScript using N-API,
allowing for scheduling of async tasks without the use of tokio::spawn,
that is used when creating async functions through the napi-rs macros.

The main motivation for this change, was to improve the performance of the driver.
With the existing approach we spend a lot of CPU time on synchronization
between the main thread and the tokio threads. By reducing CPU time,
I aim to also improve the driver runtime.

This approaches polls all the futures on the Node.js main thread,
replacing napi-rs's built-in async task system which polls on
Tokio worker threads.

Architecture:

Single weak ThreadsafeFunction (TSFN) shared across all futures,
with manual ref/unref to control Node.js event loop lifetime
FutureRegistry (thread-local on main thread) stores in-flight futures
paired with their napi_deferred handles
Per-future Waker backed by Arc implementing the Wake trait,
which pushes the future id into a shared woken_ids vec and signals
the TSFN
Coalesced signaling via AtomicBool prevents flooding the event loop
when multiple wakers fire simultaneously
Single-threaded Tokio runtime drives the I/O reactor only; futures
are polled on the main thread inside the TSFN callback with the
Tokio runtime context entered

Key design decisions:

Polling on main thread ensures napi_env is always valid during
ToNapiValue conversion, avoiding cross-thread napi safety issues
Type-erased BoxFuture and SettleCallback allow heterogeneous futures
in a single HashMap without leaking generic parameters
Promise created via raw napi_create_promise/napi_resolve_deferred
to bypass napi-rs's async machinery entirely

[This PR was created with heavy use of LLM tools. At the current moment, the code was significantly refactored to match the existing style kept at this repository and improve error handling]

This PR aims to significantly improve the performance of the driver.

Refs: #75. With this optimisation, the performance for the GA release should not be a problem any loger.

Some early results:

Copilot

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adespawn · 2026-03-16T14:35:28Z

The code should be ready for some early review. For this PR to be fully ready I want to add more benchmarks

wprzytula · 2026-03-16T14:50:18Z

Please ask your LLM to generate an ASCII chart of the implemented solution.

wprzytula · 2026-03-16T14:51:18Z

benchmark/logic/parametrized_select.js

Typo in commit message:
There was a but -> There was a bug

lead -> led

wprzytula · 2026-03-16T14:52:53Z

lib/client.js

+// Initialize the direct-poll bridge once per process.
+// This sets up the Tokio reactor thread and the wake mechanism used by all
+// bridged async Rust functions (session queries, paging, etc.).
+rust.initPollBridge();


❓ When is this executed? Is this guaranteed to be executed at most once? Is this idempotent?

When is this executed

When the file is first imported

Is this guaranteed to be executed at most once

Almost always yes: https://nodejs.org/docs/latest/api/modules.html#caching

Is this idempotent

No. The following calls will lead to a panic.

Is this idempotent

Do you think it's worth making it idempotent, i.e. setting an atomic flag at the beginning of its execution that prevents double initialization?

wprzytula · 2026-03-16T14:54:06Z

src/lib.rs

 extern crate napi_derive;

 // Link other files
+pub mod casync;


🤔 I dislike the name. Can we use something more intuitive?
future_bridge?
task_bridge?
async_bridge? -> I like this the most.

wprzytula · 2026-03-16T14:54:43Z

src/errors.rs

+impl std::fmt::Display for ConvertedError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{}: {}", self.name, self.msg)
+    }
+}
+


Let's have this as a separate commit.

src/tests/casync_tests.rs

src/casync.rs

wprzytula · 2026-03-16T15:24:29Z

src/casync.rs

+    // Create the TSFN from a no-op C callback.
+    // `build_callback` replaces the JS call — the noop is never invoked.
+    let noop_fn = env.create_function::<(), ()>("pollBridgeNoop", noop_callback_c_callback)?;
+
+    let tsfn = noop_fn
+        .build_threadsafe_function::<()>()
+        .weak::<true>()
+        .build_callback(|ctx| {
+            let raw_env = ctx.env;
+            REGISTRY.with(|r| {
+                r.borrow_mut().poll_woken(raw_env);
+            });
+            Ok(())
+        })?;


💭 This trickery makes me wonder if noop_callback is really needed.

❓ What does it mean that a TSFN is weak?

What does it mean that a TSFN is weak

It means it's existance will not prevent node from gracefully finishing

https://napi.rs/docs/concepts/threadsafe-function#weak-threadsafefunction

If so, then why do we need to ref and unref it?

We mannualy ref / unref when we have some promises pending, so that we block the node.js from finishing only when we have some queries running

wprzytula · 2026-03-16T15:25:46Z

src/casync.rs

+    // Cleanup hook — shut down the runtime when Node exits.
+    env.add_env_cleanup_hook((), |_| {
+        REGISTRY.with(|r| {
+            r.borrow_mut().shutdown();
+        });
+    })?;


This explains why Mutex<Option<Tsfn>> instead of OnceLock<Tsfn>.

wprzytula · 2026-03-16T15:44:33Z

src/casync.rs

+/// Submit a typed Rust future to be polled directly by the Node event loop.
+///
+/// Future can return a typed value `T` on success
+/// or an error `E` on failure. Both `T` and `E` are converted to JS values via
+/// `ToNapiValue` on the main thread when the future settles.
+///
+/// The error type `E` should produce a JS Error object from `to_napi_value` so
+/// that the rejection value is a proper error (e.g. `ConvertedError`).
+pub fn submit_future<F, T>(env: &Env, fut: F) -> ConvertedResult<JsPromise<T>>
+where
+    F: Future<Output = std::result::Result<T, ConvertedError>> + Send + 'static,
+    T: napi::bindgen_prelude::ToNapiValue + Send + 'static,
+{
+    // This is a driver error, so panic is warranted here. There is no reasonable way to recover.
+    assert!(
+        INITIALIZED.load(Ordering::Relaxed),
+        "init_poll_bridge must be called before submit_future. This is a bug in the driver."
+    );
+
+    let (deferred, promise) = create_promise(env)?;
+
+    let boxed: BoxFuture = Box::pin(async move {
+        let result = fut.await;
+        Box::new(move |env: Env, deferred| unsafe {
+            // SAFETY: This closure is only ever invoked from `poll_woken`, which runs
+            // on the Node main thread inside the TSFN callback - the only place where
+            // `env` is a valid napi_env. `deferred` is consumed exactly once here,
+            // satisfying the napi contract that each deferred is resolved or rejected
+            // exactly once. `to_napi_value` receives the same valid `env`.
+            let (js_val, resolve) = match result {
+                Ok(val) => (T::to_napi_value(env.raw(), val), true),
+                Err(err) => (ConvertedError::to_napi_value(env.raw(), err), false),
+            };
+            let status = js_val
+                // First we try to accept / reject with converted value / error.
+                .and_then(|v| {
+                    if resolve {
+                        check_status!(sys::napi_resolve_deferred(env.raw(), deferred, v))
+                    } else {
+                        check_status!(sys::napi_reject_deferred(env.raw(), deferred, v))
+                    }
+                })
+                // If this fails, or we failed to convert the value / error into a JS value,
+                // we reject with a fallback reason.
+                .or_else(|e| reject_with_reason(env, deferred, &e.reason));
+
+            if let Err(e) = status {
+                // If both fail, we assume something terrible has happened. We cannot
+                // inform JS side about the error by regular error handling, so we panic to
+                // avoid silent failures and orphaned promises.
+                panic!(
+                    "Failed to settle promise in TSFN callback. This may indicate either a bug in the driver or a severe runtime error.\nRoot cause:\n {}",
+                    e.reason
+                );
+            }
+        }) as SettleCallback
+    });
+
+    REGISTRY.with(|r| r.borrow_mut().insert(env, boxed, deferred))?;
+    Ok(JsPromise(promise, PhantomData))
+}


💭 I'm wondering if it makes sense to perform the first poll() straightaway. This could reduce latency. When executing prepared statements (the main point of our interest), the logic is as follows:

serialize statement's bound values,

calculate token,

configure the execution,

ask LBP for routing decision,

create a request frame,

send the frame via a channel to a tokio task managing the connection (router),

wait until the response arrives.
If I'm not mistaken, all points but the last can happen during a single poll! The rest is on the router, which is driven by the tokio runtime worker thread(s).

wait until the response arrives.

All points except this one I can agree can happen in the single poll

I insist on this, because the latency gains can be significant.

wprzytula · 2026-03-16T15:45:02Z

In general, looks promising!

This commit adds a custom async bridge between Rust and JavaScript using N-API, allowing for scheduling of async tasks without the use of tokio::spawn, that is used when creating async functions through the napi-rs macros. The main motivation for this change, was to improve the performance of the driver. With the existing approach we spend a lot of CPU time on synchronization between the main thread and the tokio threads. By reducing CPU time, I aim to also improve the driver runtime. This approaches pools all the futures on the Node.js main thread, replacing napi-rs's built-in async task system which polls on Tokio worker threads. Architecture: - Single weak ThreadsafeFunction (TSFN) shared across all futures, with manual ref/unref to control Node.js event loop lifetime - FutureRegistry (thread-local on main thread) stores in-flight futures paired with their napi_deferred handles - Per-future Waker backed by Arc<WakerInner> implementing the Wake trait, which pushes the future id into a shared woken_ids vec and signals the TSFN - Coalesced signaling via AtomicBool prevents flooding the event loop when multiple wakers fire simultaneously - Single-threaded Tokio runtime drives the I/O reactor only; futures are polled on the main thread inside the TSFN callback with the Tokio runtime context entered Key design decisions: - Polling on main thread ensures napi_env is always valid during ToNapiValue conversion, avoiding cross-thread napi safety issues - Type-erased BoxFuture and SettleCallback allow heterogeneous futures in a single HashMap without leaking generic parameters - Promise created via raw napi_create_promise/napi_resolve_deferred to bypass napi-rs's async machinery entirely [This commit including this commit message was created with heavy use of LLM tools. At the current moment, the code was slightly refactored to partially match the existing style kept at this repository.]

There was a but that lead to incorrect assertion in the benchmark

adespawn · 2026-03-18T08:57:20Z

Rebased on main

adespawn · 2026-03-18T12:40:19Z

Addressed some comments and added a new wrapper for safety (this one fully written by hand). For now, the changes are not split into components properly yet.

wprzytula · 2026-03-18T14:39:19Z

src/casync.rs


 impl<T> ToNapiValue for JsPromise<T> {
+    /// # Safety
+    /// No constrains on safety. The unsafe is required by the trait.


constraints

wprzytula · 2026-03-18T14:40:37Z

src/casync.rs

+type Tsfn = napi::threadsafe_function::ThreadsafeFunction<(), (), (), Status, false, true>;

+/// Single Thread safe function, coalesced wake signals
 struct WakerBridge {


single-thread-safety is not a thing...

wprzytula · 2026-03-18T14:57:08Z

src/casync.rs

+/// Submit a typed Rust future to be polled directly by the Node event loop.
+///
+/// Future can return a typed value `T` on success
+/// or an error `E` on failure. Both `T` and `E` are converted to JS values via
+/// `ToNapiValue` on the main thread when the future settles.
+///
+/// The error type `E` should produce a JS Error object from `to_napi_value` so
+/// that the rejection value is a proper error (e.g. `ConvertedError`).
+pub fn submit_future<F, T>(env: &Env, fut: F) -> ConvertedResult<JsPromise<T>>
+where
+    F: Future<Output = std::result::Result<T, ConvertedError>> + Send + 'static,
+    T: napi::bindgen_prelude::ToNapiValue + Send + 'static,
+{
+    // This is a driver error, so panic is warranted here. There is no reasonable way to recover.
+    assert!(
+        INITIALIZED.load(Ordering::Relaxed),
+        "init_poll_bridge must be called before submit_future. This is a bug in the driver."
+    );
+
+    let (deferred, promise) = create_promise(env)?;
+
+    let boxed: BoxFuture = Box::pin(async move {
+        let result = fut.await;
+        Box::new(move |env: Env, deferred| unsafe {
+            // SAFETY: This closure is only ever invoked from `poll_woken`, which runs
+            // on the Node main thread inside the TSFN callback - the only place where
+            // `env` is a valid napi_env. `deferred` is consumed exactly once here,
+            // satisfying the napi contract that each deferred is resolved or rejected
+            // exactly once. `to_napi_value` receives the same valid `env`.
+            let (js_val, resolve) = match result {
+                Ok(val) => (T::to_napi_value(env.raw(), val), true),
+                Err(err) => (ConvertedError::to_napi_value(env.raw(), err), false),
+            };
+            let status = js_val
+                // First we try to accept / reject with converted value / error.
+                .and_then(|v| {
+                    if resolve {
+                        check_status!(sys::napi_resolve_deferred(env.raw(), deferred, v))
+                    } else {
+                        check_status!(sys::napi_reject_deferred(env.raw(), deferred, v))
+                    }
+                })
+                // If this fails, or we failed to convert the value / error into a JS value,
+                // we reject with a fallback reason.
+                .or_else(|e| reject_with_reason(env, deferred, &e.reason));
+
+            if let Err(e) = status {
+                // If both fail, we assume something terrible has happened. We cannot
+                // inform JS side about the error by regular error handling, so we panic to
+                // avoid silent failures and orphaned promises.
+                panic!(
+                    "Failed to settle promise in TSFN callback. This may indicate either a bug in the driver or a severe runtime error.\nRoot cause:\n {}",
+                    e.reason
+                );
+            }
+        }) as SettleCallback
+    });
+
+    REGISTRY.with(|r| r.borrow_mut().insert(env, boxed, deferred))?;
+    Ok(JsPromise(promise, PhantomData))
+}


I insist on this, because the latency gains can be significant.

adespawn force-pushed the casync-clean branch from 98b4967 to 11b3b84 Compare March 16, 2026 14:26

adespawn requested a review from Copilot March 16, 2026 14:26

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Copilot started reviewing on behalf of adespawn March 16, 2026 14:38 View session

wprzytula requested changes Mar 16, 2026

View reviewed changes

adespawn added 2 commits March 18, 2026 09:56

Fix parametrized select

1b2b1fa

There was a but that lead to incorrect assertion in the benchmark

adespawn force-pushed the casync-clean branch from 11b3b84 to 1b2b1fa Compare March 18, 2026 08:57

adespawn added 2 commits March 18, 2026 13:40

Some refactor

8c410ab

Wrapper over DeferredPtr

9bf0cdd

adespawn force-pushed the casync-clean branch from 8034511 to 9bf0cdd Compare March 18, 2026 12:40

wprzytula reviewed Mar 18, 2026

View reviewed changes

wprzytula requested changes Mar 18, 2026

View reviewed changes

Conversation

adespawn commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

adespawn commented Mar 16, 2026

Uh oh!

wprzytula commented Mar 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wprzytula commented Mar 16, 2026

Uh oh!

adespawn commented Mar 18, 2026

Uh oh!

adespawn commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adespawn commented Mar 13, 2026 •

edited

Loading