Nativejq - many simple jq commands failed#84
Nativejq - many simple jq commands failed#84robertyates wants to merge 4 commits intovercel-labs:mainfrom
Conversation
|
@robertlyates is attempting to deploy a commit to the Vercel Labs Team on Vercel. A member of the Team first needs to authorize it. |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
| * For performance, only check the first 8KB of large strings. | ||
| */ | ||
| function getFileEncoding(content: string): "binary" | "utf8" { | ||
| for (let i = 0; i < content.length; i++) { | ||
| const SAMPLE_SIZE = 8192; // 8KB | ||
|
|
||
| // For large strings, only check the first 8KB | ||
| // This is sufficient since UTF-8 files typically have Unicode chars early | ||
| const checkLength = Math.min(content.length, SAMPLE_SIZE); | ||
|
|
||
| for (let i = 0; i < checkLength; i++) { |
|
What are the simple commands? |
|
have a look at this test file, sorry, now that i look I didn't put it in the right spot I also updated a bunch of existing tests that did not match the expectations with the real jq, i changed them based on the real jq outputs, this revealed itself pretty quickly when i dropped in jq-was let me know if you need me to move things around in the PR, could jq-was be behind a switch ? also, this piggybacks on my other PR that is about handling larger files, which are not that large i.e. >0.5 MB. If you could merge those first it would be great, that should be an easy merge i hope |
|
I'm extremely conscious of increasing the attack surface of just-bash and introducing a wasm dependency must be deeply analyzed in terms of how it interacts with the virtual filesystem and other resources. It would probably be better to import a good test suite and fix the implementation |
|
i don't know exactly the attack vector that you're worried about, but it seems very well isolated 1.) it's in a separate worker, it doesn't have access to the file system, system in is piped in and system out is piped out of the worker and then the worker exits i have, in the past, looked to build a jq parser just to try and explain what it was doing, i don't know who invented jq, but it is an incredibly difficult language to deal with, I think a lot of the parsing was hand built, trying to build a similar one is none trivial, jq-wasm uses the same code as jq, why not leverage it ? would you accept a PR with it as a configuration option ? also, please can you merge this other PR quickly - #69 your code base is moving incredibly fast (I love what you're doing here) and I am worried about that PR getting stale. I also really need the fix (as I suspect most people will). |
|
I left a comment on the other PR. That definitely should go in quickly! Basically, I am concerned about all the current WASM dependencies. Python is optional (had to be, it is too complex to secure). sqlite3 seems fine and this may be fine. It's important to be extremely clear about reasoning through the fact what is exposed to the WASM code and emscripten makes it hard because it comes with filesystem stuff by default |
|
from the other PR (which I think was meant for this PR)
https://github.com/robertyates/just-bash-1/blob/50fbda450bb2519dc4492d33cb5a39cd4b5969d5/src/jq-budget-test.test.ts contains the failing commands, but it's also worth looking at the diff to your existing test cases that I made as i found several tests that do not match the expected jq behaviour and yes, I could roll my own jq command, would that take precedence ? or I assume I can turn off the current jq command, was thinking this would help others |
|
The important first step is to import a high quality test suite. I thought I did, but maybe that needs more work |
seeing jq fail with fairly simple jq commands, have replaced with jq-was and a worker model so we can timeout, needed to update a few tests to match expected behaviour for jq, everything was checked against the command line