-
-
Notifications
You must be signed in to change notification settings - Fork 1
Rewrite in N-API for context-awareness #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite in N-API for context-awareness #3
Conversation
This is an ongoing nightmare! It took me a while to realize that the original binding was a relic of how early it had been written in the life of Node. It used a lot of `libuv` functions directly for things that have had easier-to-use abstractions for a while now — both in `nan` and in N-API. It was also a good opportunity to decaffeinate the JavaScript files. Furthermore, we don't need the `HandleMap` stuff, nor the scarily-named `UnsafePersistent`, since we have proper `Map`s in JavaScript now. (I'm not sure we _ever_ needed `HandleMap`, but I digress.) This isn't over! It works on macOS, but it needs platform-specific adjustments for Windows and Linux. Both of those will be painful. I'm sure I can also improve upon the macOS file-watching somehow.
This comes with the painful revelation that one spec has been spuriously passing for years, and the functionality it claims to support doesn't seem to work (at least on macOS). That spec is skipped until that bug can be fixed. This required a more modern version of Jasmine that could grok `async`/`await` syntax. The magical `jasmine-tagged` stuff is currently being done manually instead.
(man, that was silly)
…and prevent an instance from being destroyed while processing events.
…for `detectResurrectionAfterDelay`. (The Pulsar specs rely on being able to stub it to circumvent the async-ness.)
(mutices?)
I wasted a few hours trying to handle a case that `pathwatcher` never claimed to support. The upside is that I did manage to simplify the custom FSEvents watcher implementation and eliminate some potentially costly looping!
…when we're not watching in the first place.
…because EFSW was returning handles larger than `Number.MAX_SAFE_INTEGER`.
|
I totally get how intimidating this PR is. But since it's the basis of what's been shipped in PulsarNext for months, I think I might give this another ~10 days of openness at the absolute max, and then merge it in the absence of objections? |
10ed935 to
d2281e6
Compare
|
However we land this, I feel it'd be helpful for this PR conversation to permalink to a branch where all this commit history is browsable -- if there are some things that we don't fully understand, it can be helpful "archeologically/forensically" digging through the blame view and seeing "okay this was just tried as a guess, we don't have to be wedded to it if we need to change it down the line for whatever reason." As much as this commit history is not straightforward, it does lay out the motivation a little more transparently than a squash, so a copy like this should be retained somewhere I'd argue. Thanks for doing all this, still combing through and looking for insights, but I start with the impression that this was a lot of work! UPDATE: Here's a link to such a branch: https://github.com/pulsar-edit/node-pathwatcher/commits/n-api-with-efsw-bundled/ (The content-addressable (immutable per se) tip of the branch as of writing, for posterity, is commit d2281e6.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I haven't made a ton of progress trying to review this. Partly due to it being very big, and partly down to the fact that I'm not advanced in JS nor C/C++.
[EDIT to add more of my train of through and reasoning/justification: I have glanced at the diffstat for all the folders, the LICENSE file for the vendored EFSW appears to be sufficient that I'm confident we are allowed to vendor it in such a way, CI is passing (on all three major OSes we support), including several new specs. PR body strikes me as well-reasoned, and this is obviously a ton of work that has been in-use for months now. IIRC we've had positive feedback and users asking when this will be in the primary version of Pulsar, so this is tested and working for multiple people if I vaguely recall correctly.]
I am going to give this my nod of approval mostly on the strength of it having attestedly been used in Pulsar Next for some time now, and the lack of complaints as far as I know (on top of it solving important problems -- context-awareness, and IIRC, watcher performance for some users??).
I wish I had more to add, but yeah; this is mostly a rubber stamp, but looks good to me (to the extent I can form any informed opinion)!
Thank you much for doing all this!
|
I also want to commend the coordination efforts with an upstream project (efsw) to share notes and potentially generalize the benefit of this research to help the broader ecosystem -- ourselves included potentially, but that is certainly a part of how that is meant to work also. Good open-source practices, IMHO! |
This is a monumental change, but I’ll try to summarize it.
The first tagged version of
pathwatcheris twelve years old. The Node ecosystem was vastly different back then. The native bindings didn’t even originally usenan; they consumedlibuvdirectly. It was not yet possible to run more than one Node instance in the same process, so context-awareness wasn’t even a glint in anyone’s eye yet.When we had to go through all the old native modules and ensure they were context-aware for the PulsarNext endeavor,
pathwatcherwas by far the biggest challenge. Lots of the native code still consumedlibuvdirectly despite depending onnan.I’m not a C++ expert, to say the least. I was able to understand tutorials aimed at migrating
nancode to N-API, but I lack the ability to take lower-level code and re-imagine it in a context-aware environment. Still, I made the effort!First attempt: modest rewrite
In lieu of understanding exactly what would achieve context-awareness, I wrote some tests that used threads in order to introduce multiple contexts within the same process and treated them as a proving ground for whatever I came up with.
I then tried to perform the minimum amount of upgrading that could possibly work — first by adapting the existing
nanbindings to add context-awareness, then by attempting a minimal N-API rewrite. The goal was to keep things the same as much as possible, but with the added requirement that whatever I came up with must still demonstrate the ability to work in multiple contexts at once without segfaulting.This was nearly a year ago and I don’t remember the exact details… but the
n-apibranch shows the results of that experiment. In short, the existing approach was too low-level for the amount of abstraction that context-awareness demanded.I started looking around for other solutions.
Second attempt: bring in outside help
Rewriting the library entirely was no mean feat — it would have to perform similarly to the existing
pathwatcherwithout any API changes. That’s a particular challenge becausepathwatcheris the only file-watching library I can find that attempts to work synchronously.All modern file-watchers expect that it will be some amount of time between your request to monitor a file/directory for changes and its fulfillment. (This made it more or less impossible to modernize
pathwatcherby rewriting it to use some other file-watcher under the hood — and, yes, I definitely attempted that as well.)Eventually I found
efsw. It uses the right underlying APIs for each platform (oldpathwatchernever usedFSEventson macOS, even after it became widely available) and it allowed for synchronousness. The process was slow-going because of my C++ inexperience, and it involved lots of long talks with Claude, the code robot which I find least distasteful of all the options out there. I took all the code Claude gave me with several grains of salt, checking its approaches against non-AI sources for sanity’s sake.Complications on macOS
Eventually, I realized that
efsw’s macOS approach was flawed. It used oneFSEventswatcher per requested file/path, but there’s a hard system-wide limit of (I think) 1024 watchers, so that’s just not practical. (I didn't know about this limit until I had evidence of file-watching calls silently failing, at which point I dug into Console.app and learned why.)I spent about a week reacting to this constraint in a fatally misguided way.
pathwatcherdoesn’t support recursive directory watching, butefswallows you to add either recursive or non-recursive watchers. So I tried to optimize for watcher reuse on macOS by opting into recursive file-watching inefsw, then introducing a scheme similar to theNativeWatcherRegistrythatwatchPathuses. This got the specs to pass, but it unquestionably made the code worse — more complex and harder to reason about — just to solve an issue that was happening on only one of the three platforms we needed to support. (I understand why this watcher-reuse code was written for Atom back in the day, but I absolutely hate it and would love to find a way to make it unnecessary in Pulsar. So it was dumb of me to turn to this code I hated, even if I thought it would solve my problem.)I refocused. This was a macOS-specific issue, so it needed a macOS-specific solution. Luckily, I discovered that each
FSEventswatcher can watch arbitrarily many files, so no single consumer should need to declare more than two. Even more luckily, I was able to write a macOS-only adapter that is used instead ofefsw’s approach in a way that is API-compatible. (We declare two watchers because, when you change the exact list of watched paths, you must restart the stream; so we alternate the streams between “primary” and “backup” and stagger their starting/stopping when paths change to ensure we don’t miss any events.)(The author of
efswwas unaware of the watcher limit on macOS, and is interested in integrating these changes… but my implementation cares only about non-recursive watching, so it’s not something I can contribute upstream in its current form.)Stability
The resulting
pathwatcherlibrary has been running on theupdated-latest-electronbranch for many months now. I have not suffered from anypathwatcher-related crashes for months, and I have not observed any real-world regressions inpathwatcherbehavior. I am not 100% certain we’ll be able to deprecatepathwatcherin the near future, as much as I’d like to; but I’m convinced this rewrite is a solid enough base on which to proceed.Testing
I would be a liar if I said that this rewrite were seamless and required no changes to the existing test suite.
One thing that the rewrite appears to struggle with is frequent watching and unwatching of the same file or path. This is something that is highly unlikely to happen outside of the unit tests — but unit tests are how I prove that this rewrite is up to snuff, so you can see the dilemma.
I mitigated this by introducing 50ms of waiting after each test; this is long enough to get the tests to pass 100% of the time on my machine.
Still, I would love to make this a non-issue in the future. (I can't prove that it won't be a problem in the real world.) I have done some experimentation with an approach such as this:
./foo. Add it to the native watcher../foo. Stop reporting changes to./fooimmediately, but do not remove it from the native watcher yet. Instead, schedule it to happen at some point in the future (like 1 second from now)../foowithin that span of time: cancel the scheduled removal, stop ignoring changes to./foo, and leave the native watcher unchanged../fooafter that time: remove it from the native watcher, then remove./foofrom the ignore list (because we're not watching it anymore, so there's no need to keep it on the list).I have tried this approach and gotten it mostly working, but it's not bulletproof. For one thing, it complicates the code. To perform a task after a delay, I can (a) schedule it to happen in another thread (at which point I need to signal the main thread so it can finish the unwatching operation); or (b) schedule it with
libuv. Option A broke my brain. Option B is conceptually much simpler, but leaves me less confident that we’re operating with context awareness. Indeed, using this approach, I can produce sporadic segfaults under a multi-thread stress test, and I’m not yet conversant enough in the debugging tools to figure out exactly what’s going wrong.But if we ever recruited someone to the project with experience in these areas, this is one of the first things I’d be eager for them to take a look at.
Review
How is anyone supposed to review this? I have no idea. I would suggest that you look it over, make sure nothing appears straightforwardly insane, and then rely on whatever experience you have with using PulsarNext in the last ~10 months as proof that this migration is fundamentally sound.
You may also find this issue useful. Early on it's mainly about
nsfw(a different file-watching library thatwatchPathuses) but then it becomes about preventing crashes that happen because ofpathwatcher. It gives more context for my evolving approaches and might have some details that I'm not remembering at this moment.The future
I don’t want to maintain this library indefinitely. Our other file-watching solutions (like whatever we wrap in
watchPath) deal only with directories, but in places where I have need to watch a single file, I’ve started to prefer the builtinfs.watchrather than introduce new code that depends onpathwatcher.Still, is there a future where we don’t use it anymore? Only if we’re willing to weaken its API contract. Packages don’t use it directly, but we expose the
FileandDirectoryclasses to packages, and the change listeners that get registered on those instances usepathwatcherunder the hood. That’s not something that is easy to convert to an async API. Still, I bet we could do it; it’s just a matter of figuring out the approach that changes behavior as little as possible.Next steps
Pulsar has never consumed its own fork of
pathwatcher. The last commit to themasterbranch of this repo is from the last Atom maintainer, and the stable release of Pulsar consumes the NPM modulepathwatcherrather than a repository reference.This means that, once we land this to
master, we can bump the major version and publish it to NPM as@pulsar-edit/pathwatcher. Then PulsarNext can declare a dependency on that version instead of a random branch of my random fork.