Rewrite in N-API for context-awareness #3

savetheclocktower · 2025-09-01T18:20:02Z

This is a monumental change, but I’ll try to summarize it.

The first tagged version of pathwatcher is twelve years old. The Node ecosystem was vastly different back then. The native bindings didn’t even originally use nan; they consumed libuv directly. It was not yet possible to run more than one Node instance in the same process, so context-awareness wasn’t even a glint in anyone’s eye yet.

When we had to go through all the old native modules and ensure they were context-aware for the PulsarNext endeavor, pathwatcher was by far the biggest challenge. Lots of the native code still consumed libuv directly despite depending on nan.

I’m not a C++ expert, to say the least. I was able to understand tutorials aimed at migrating nan code to N-API, but I lack the ability to take lower-level code and re-imagine it in a context-aware environment. Still, I made the effort!

First attempt: modest rewrite

In lieu of understanding exactly what would achieve context-awareness, I wrote some tests that used threads in order to introduce multiple contexts within the same process and treated them as a proving ground for whatever I came up with.

I then tried to perform the minimum amount of upgrading that could possibly work — first by adapting the existing nan bindings to add context-awareness, then by attempting a minimal N-API rewrite. The goal was to keep things the same as much as possible, but with the added requirement that whatever I came up with must still demonstrate the ability to work in multiple contexts at once without segfaulting.

This was nearly a year ago and I don’t remember the exact details… but the n-api branch shows the results of that experiment. In short, the existing approach was too low-level for the amount of abstraction that context-awareness demanded.

I started looking around for other solutions.

Second attempt: bring in outside help

Rewriting the library entirely was no mean feat — it would have to perform similarly to the existing pathwatcher without any API changes. That’s a particular challenge because pathwatcher is the only file-watching library I can find that attempts to work synchronously.

PathWatcher.watch('foo.js', () => console.log('changed!'));
fs.writeFileSync('foo.js', 'new content');
// logs 'changed!'

All modern file-watchers expect that it will be some amount of time between your request to monitor a file/directory for changes and its fulfillment. (This made it more or less impossible to modernize pathwatcher by rewriting it to use some other file-watcher under the hood — and, yes, I definitely attempted that as well.)

Eventually I found efsw. It uses the right underlying APIs for each platform (old pathwatcher never used FSEvents on macOS, even after it became widely available) and it allowed for synchronousness. The process was slow-going because of my C++ inexperience, and it involved lots of long talks with Claude, the code robot which I find least distasteful of all the options out there. I took all the code Claude gave me with several grains of salt, checking its approaches against non-AI sources for sanity’s sake.

Complications on macOS

Eventually, I realized that efsw’s macOS approach was flawed. It used one FSEvents watcher per requested file/path, but there’s a hard system-wide limit of (I think) 1024 watchers, so that’s just not practical. (I didn't know about this limit until I had evidence of file-watching calls silently failing, at which point I dug into Console.app and learned why.)

I spent about a week reacting to this constraint in a fatally misguided way. pathwatcher doesn’t support recursive directory watching, but efsw allows you to add either recursive or non-recursive watchers. So I tried to optimize for watcher reuse on macOS by opting into recursive file-watching in efsw, then introducing a scheme similar to the NativeWatcherRegistry that watchPath uses. This got the specs to pass, but it unquestionably made the code worse — more complex and harder to reason about — just to solve an issue that was happening on only one of the three platforms we needed to support. (I understand why this watcher-reuse code was written for Atom back in the day, but I absolutely hate it and would love to find a way to make it unnecessary in Pulsar. So it was dumb of me to turn to this code I hated, even if I thought it would solve my problem.)

I refocused. This was a macOS-specific issue, so it needed a macOS-specific solution. Luckily, I discovered that each FSEvents watcher can watch arbitrarily many files, so no single consumer should need to declare more than two. Even more luckily, I was able to write a macOS-only adapter that is used instead of efsw’s approach in a way that is API-compatible. (We declare two watchers because, when you change the exact list of watched paths, you must restart the stream; so we alternate the streams between “primary” and “backup” and stagger their starting/stopping when paths change to ensure we don’t miss any events.)

(The author of efsw was unaware of the watcher limit on macOS, and is interested in integrating these changes… but my implementation cares only about non-recursive watching, so it’s not something I can contribute upstream in its current form.)

Stability

The resulting pathwatcher library has been running on the updated-latest-electron branch for many months now. I have not suffered from any pathwatcher-related crashes for months, and I have not observed any real-world regressions in pathwatcher behavior. I am not 100% certain we’ll be able to deprecate pathwatcher in the near future, as much as I’d like to; but I’m convinced this rewrite is a solid enough base on which to proceed.

Testing

I would be a liar if I said that this rewrite were seamless and required no changes to the existing test suite.

One thing that the rewrite appears to struggle with is frequent watching and unwatching of the same file or path. This is something that is highly unlikely to happen outside of the unit tests — but unit tests are how I prove that this rewrite is up to snuff, so you can see the dilemma.

I mitigated this by introducing 50ms of waiting after each test; this is long enough to get the tests to pass 100% of the time on my machine.

Still, I would love to make this a non-issue in the future. (I can't prove that it won't be a problem in the real world.) I have done some experimentation with an approach such as this:

Someone subscribes to ./foo. Add it to the native watcher.
Later, they unsubscribe from ./foo. Stop reporting changes to ./foo immediately, but do not remove it from the native watcher yet. Instead, schedule it to happen at some point in the future (like 1 second from now).
If the user re-subscribes to ./foo within that span of time: cancel the scheduled removal, stop ignoring changes to ./foo, and leave the native watcher unchanged.
But if the user hasn’t re-subscribed to ./foo after that time: remove it from the native watcher, then remove ./foo from the ignore list (because we're not watching it anymore, so there's no need to keep it on the list).

I have tried this approach and gotten it mostly working, but it's not bulletproof. For one thing, it complicates the code. To perform a task after a delay, I can (a) schedule it to happen in another thread (at which point I need to signal the main thread so it can finish the unwatching operation); or (b) schedule it with libuv. Option A broke my brain. Option B is conceptually much simpler, but leaves me less confident that we’re operating with context awareness. Indeed, using this approach, I can produce sporadic segfaults under a multi-thread stress test, and I’m not yet conversant enough in the debugging tools to figure out exactly what’s going wrong.

But if we ever recruited someone to the project with experience in these areas, this is one of the first things I’d be eager for them to take a look at.

Review

How is anyone supposed to review this? I have no idea. I would suggest that you look it over, make sure nothing appears straightforwardly insane, and then rely on whatever experience you have with using PulsarNext in the last ~10 months as proof that this migration is fundamentally sound.

You may also find this issue useful. Early on it's mainly about nsfw (a different file-watching library that watchPath uses) but then it becomes about preventing crashes that happen because of pathwatcher. It gives more context for my evolving approaches and might have some details that I'm not remembering at this moment.

The future

I don’t want to maintain this library indefinitely. Our other file-watching solutions (like whatever we wrap in watchPath) deal only with directories, but in places where I have need to watch a single file, I’ve started to prefer the builtin fs.watch rather than introduce new code that depends on pathwatcher.

Still, is there a future where we don’t use it anymore? Only if we’re willing to weaken its API contract. Packages don’t use it directly, but we expose the File and Directory classes to packages, and the change listeners that get registered on those instances use pathwatcher under the hood. That’s not something that is easy to convert to an async API. Still, I bet we could do it; it’s just a matter of figuring out the approach that changes behavior as little as possible.

Next steps

Pulsar has never consumed its own fork of pathwatcher. The last commit to the master branch of this repo is from the last Atom maintainer, and the stable release of Pulsar consumes the NPM module pathwatcher rather than a repository reference.

This means that, once we land this to master, we can bump the major version and publish it to NPM as @pulsar-edit/pathwatcher. Then PulsarNext can declare a dependency on that version instead of a random branch of my random fork.

This is an ongoing nightmare! It took me a while to realize that the original binding was a relic of how early it had been written in the life of Node. It used a lot of `libuv` functions directly for things that have had easier-to-use abstractions for a while now — both in `nan` and in N-API. It was also a good opportunity to decaffeinate the JavaScript files. Furthermore, we don't need the `HandleMap` stuff, nor the scarily-named `UnsafePersistent`, since we have proper `Map`s in JavaScript now. (I'm not sure we _ever_ needed `HandleMap`, but I digress.) This isn't over! It works on macOS, but it needs platform-specific adjustments for Windows and Linux. Both of those will be painful. I'm sure I can also improve upon the macOS file-watching somehow.

This comes with the painful revelation that one spec has been spuriously passing for years, and the functionality it claims to support doesn't seem to work (at least on macOS). That spec is skipped until that bug can be fixed. This required a more modern version of Jasmine that could grok `async`/`await` syntax. The magical `jasmine-tagged` stuff is currently being done manually instead.

(man, that was silly)

…and prevent an instance from being destroyed while processing events.

…for `detectResurrectionAfterDelay`. (The Pulsar specs rely on being able to stub it to circumvent the async-ness.)

(mutices?)

I wasted a few hours trying to handle a case that `pathwatcher` never claimed to support. The upside is that I did manage to simplify the custom FSEvents watcher implementation and eliminate some potentially costly looping!

…when we're not watching in the first place.

…because EFSW was returning handles larger than `Number.MAX_SAFE_INTEGER`.

savetheclocktower · 2025-09-10T23:13:14Z

I totally get how intimidating this PR is. But since it's the basis of what's been shipped in PulsarNext for months, I think I might give this another ~10 days of openness at the absolute max, and then merge it in the absence of objections?

DeeDeeG · 2025-09-19T02:16:39Z

However we land this, I feel it'd be helpful for this PR conversation to permalink to a branch where all this commit history is browsable -- if there are some things that we don't fully understand, it can be helpful "archeologically/forensically" digging through the blame view and seeing "okay this was just tried as a guess, we don't have to be wedded to it if we need to change it down the line for whatever reason."

As much as this commit history is not straightforward, it does lay out the motivation a little more transparently than a squash, so a copy like this should be retained somewhere I'd argue.

Thanks for doing all this, still combing through and looking for insights, but I start with the impression that this was a lot of work!

UPDATE: Here's a link to such a branch: https://github.com/pulsar-edit/node-pathwatcher/commits/n-api-with-efsw-bundled/

(The content-addressable (immutable per se) tip of the branch as of writing, for posterity, is commit d2281e6.)

DeeDeeG

Well, I haven't made a ton of progress trying to review this. Partly due to it being very big, and partly down to the fact that I'm not advanced in JS nor C/C++.

[EDIT to add more of my train of through and reasoning/justification: I have glanced at the diffstat for all the folders, the LICENSE file for the vendored EFSW appears to be sufficient that I'm confident we are allowed to vendor it in such a way, CI is passing (on all three major OSes we support), including several new specs. PR body strikes me as well-reasoned, and this is obviously a ton of work that has been in-use for months now. IIRC we've had positive feedback and users asking when this will be in the primary version of Pulsar, so this is tested and working for multiple people if I vaguely recall correctly.]

I am going to give this my nod of approval mostly on the strength of it having attestedly been used in Pulsar Next for some time now, and the lack of complaints as far as I know (on top of it solving important problems -- context-awareness, and IIRC, watcher performance for some users??).

I wish I had more to add, but yeah; this is mostly a rubber stamp, but looks good to me (to the extent I can form any informed opinion)!

Thank you much for doing all this!

DeeDeeG · 2025-09-21T02:35:24Z

I also want to commend the coordination efforts with an upstream project (efsw) to share notes and potentially generalize the benefit of this research to help the broader ecosystem -- ourselves included potentially, but that is certainly a part of how that is meant to work also. Good open-source practices, IMHO!

savetheclocktower added 30 commits October 6, 2024 18:20

Update binding.gyp

847e2fe

Fix import

5ab1832

Rewrite Linux approach to allow for graceful stopping

aad8e4d

Fix failing test

92ad3c7

(man, that was silly)

Redefine the test script in package.json

751f987

Fix import

b88a91f

Attempt to modernize GitHub actions workflow

48dfc4a

Fix README type (testing CI)

5c6a79f

Testing a possible Windows fix

53b2ef8

Skip Node 14

2945228

(oops)

b66d0c3

Trying a different fix

07ea897

Attempt to load debug binding first

0df653c

Don't try to stop a worker we're about to replace

e72daa3

Maybe I don’t actually need this?

0756a65

Attempt to rewrite the Nan stuff in the Windows implementation

670f97a

(oops)

9093d7a

(Claude thinks this might help)

912a658

Fix signature

3ed47b6

Fix typos

1ca28b7

Typo

ad61888

Fix old API usage

ed7e13a

Fix typo on Windows

cb8df27

Fix ordering of conditional test

9fdabd8

Fix expectation of number

062001a

Whoops

7faae35

Logging

6f92a52

Ugh

8460f45

savetheclocktower added 23 commits October 27, 2024 20:03

Reduce duplication in FSEventsFileWatcher…

069300a

…and prevent an instance from being destroyed while processing events.

Remove NativeWatcherRegistry (unneeded at this point)

f32158f

Return to previous pattern…

e3f7f53

…for `detectResurrectionAfterDelay`. (The Pulsar specs rely on being able to stub it to circumvent the async-ness.)

Fix issue where we inadvertently ignore events on a directory itself

c14f82e

Fix failing specs

8d71fd1

Too edgy and avant-garde for Windows/Linux

155cc44

Attempt to get new spec passing on non-macOS platforms

84ba91f

Untangle mutexes

c6f17e7

(mutices?)

Fix the directory-deletion edge case on all platforms

3f0de40

Change the specs to agree with observed reality

15c112b

Deliver more consistent behavior around directory deletion

b710480

I wasted a few hours trying to handle a case that `pathwatcher` never claimed to support. The upside is that I did manage to simplify the custom FSEvents watcher implementation and eliminate some potentially costly looping!

Change file spec to be meaningful

724ac75

Logging cleanup, further native optimization on macOS

9eb8beb

Remove unnecessary devDependencies and files

b885578

Add some comments

b66d621

Update README

2652da2

Remove getNativeWatcherCount (redundant and not used in the tests)

2c40f68

Report the original path on child-rename events

68b378e

Change the rename-a-file fix to be more targeted

585b76a

Move lock guard inside local block

6c3d8fe

Fix exception encountered in text-buffer tests

5ea87ef

Return silently on an attempt to unwatch…

649232b

…when we're not watching in the first place.

Use BigInts for JavaScript handles instead of numbers…

d2281e6

…because EFSW was returning handles larger than `Number.MAX_SAFE_INTEGER`.

savetheclocktower mentioned this pull request Sep 10, 2025

Rewrite in N-API; add Wayland support pulsar-edit/keyboard-layout#4

Merged

DeeDeeG force-pushed the n-api-with-efsw-bundled branch from 10ed935 to d2281e6 Compare September 18, 2025 00:12

DeeDeeG approved these changes Sep 21, 2025

View reviewed changes

savetheclocktower merged commit e448e08 into pulsar-edit:master Sep 21, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Rewrite in N-API for context-awareness #3

Rewrite in N-API for context-awareness #3

Uh oh!

savetheclocktower commented Sep 1, 2025 •

edited

Loading

Uh oh!

savetheclocktower commented Sep 10, 2025

Uh oh!

DeeDeeG commented Sep 19, 2025 •

edited

Loading

Uh oh!

DeeDeeG left a comment •

edited

Loading

Uh oh!

DeeDeeG commented Sep 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Rewrite in N-API for context-awareness #3

Rewrite in N-API for context-awareness #3

Uh oh!

Conversation

savetheclocktower commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

First attempt: modest rewrite

Second attempt: bring in outside help

Complications on macOS

Stability

Testing

Review

The future

Next steps

Uh oh!

savetheclocktower commented Sep 10, 2025

Uh oh!

DeeDeeG commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DeeDeeG left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DeeDeeG commented Sep 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

savetheclocktower commented Sep 1, 2025 •

edited

Loading

DeeDeeG commented Sep 19, 2025 •

edited

Loading

DeeDeeG left a comment •

edited

Loading