-
Notifications
You must be signed in to change notification settings - Fork 10
fix: handle combined quotation shell arguments like Bash #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Rewrote the parser to use a character-by-character state machine instead of complex regex. This properly handles adjacent quoted and unquoted segments as a single argument, matching Bash behavior. Examples of now-correct parsing: - `--foo="bar"'baz'` → `--foo=barbaz` - `a" b"` → `a b` - `run:silent["echo 1"]["echo 2"]` → `run:silent[echo 1][echo 2]` Also fixed existing test cases that had incorrect expectations and added new test cases for combined quotation segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the complex regex-based argument parser with a character-by-character state machine implementation to correctly handle adjacent quoted and unquoted segments as single arguments, matching Bash behavior.
Key Changes:
- Rewrote the core parsing logic from regex to a state machine that tracks quote state and builds tokens character-by-character
- Fixed test expectations to match the new parsing behavior where quotes are stripped from key-value pairs
- Added new test cases for combined quotation segments
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| index.ts | Replaced regex-based parser with state machine that processes input character-by-character, properly handling combined quoted/unquoted segments |
| test/Index.spec.js | Updated test expectations to match new quote-stripping behavior, added trailing commas for style consistency, and added two new test cases for combined quotation segments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (hasToken) { | ||
| myArray.push(current); | ||
| } | ||
|
|
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser doesn't handle unclosed quotes, which could lead to unexpected behavior. When a quote is opened but never closed, the entire rest of the input string will be consumed as part of the current token, including any subsequent arguments. Consider adding validation to detect unclosed quotes and either throw an error or handle them gracefully according to the desired behavior.
| if (inQuote) { | |
| throw new Error( | |
| `Unclosed quote (${inQuote}) found in input string` | |
| ); | |
| } |
| "value[echo][grep]+Peter's Friends", | ||
| ]); | ||
| done(); | ||
| }); | ||
|
|
||
| it("combined quotation segments", function (done) { | ||
| parseAndValidate("--foo=\"bar\"'baz'", ["--foo=barbaz"]); | ||
| done(); | ||
| }); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the new test cases cover combined quotation segments, there's no test coverage for important edge cases that could cause issues with the new parser implementation: unclosed quotes (e.g., --foo="bar), escaped quotes within strings (if supported), consecutive whitespace handling, and empty input strings. Adding tests for these scenarios would help ensure the parser handles them correctly.
| let current = ""; | ||
| let inQuote: string | null = null; | ||
| let hasToken = false; // Track if we've started a token (for empty quotes) | ||
| let i = 0; | ||
|
|
||
| // Accepts any number of arguments, and returns the first one that is a string | ||
| // (even an empty string) | ||
| function firstString(...args: Array<any>): string | undefined { | ||
| for (let i = 0; i < args.length; i++) { | ||
| const arg = args[i]; | ||
| if (typeof arg === "string") { | ||
| return arg; | ||
| while (i < value.length) { | ||
| const char = value[i]; | ||
|
|
||
| if (inQuote) { | ||
| // Inside quotes - look for the closing quote | ||
| if (char === inQuote) { | ||
| // End of quoted section | ||
| inQuote = null; | ||
| } else { | ||
| // Add character to current token (without the quotes) | ||
| current += char; | ||
| } | ||
| } else { | ||
| // Outside quotes | ||
| if (char === '"' || char === "'") { | ||
| // Start of quoted section - this means we have a token even if empty | ||
| inQuote = char; | ||
| hasToken = true; | ||
| } else if (/\s/.test(char)) { | ||
| // Whitespace - end current token if we have one | ||
| if (hasToken) { | ||
| myArray.push(current); | ||
| current = ""; | ||
| hasToken = false; | ||
| } | ||
| } else { | ||
| // Regular character - add to current token | ||
| current += char; | ||
| hasToken = true; |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string concatenation with the += operator in a loop can be inefficient for large input strings because strings are immutable in JavaScript, causing potential O(n²) time complexity. Consider using an array to collect characters and join them at the end, or use a more efficient string building approach for better performance with large inputs.
| // Start of quoted section - this means we have a token even if empty | ||
| inQuote = char; | ||
| hasToken = true; | ||
| } else if (/\s/.test(char)) { |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating and executing a regex on every character iteration is inefficient. Consider using a simple character comparison instead (e.g., char === ' ' || char === '\t' || char === '\n' || char === '\r') or move the regex outside the loop to avoid repeated regex compilation.
| } else if (/\s/.test(char)) { | |
| } else if (char === ' ' || char === '\t' || char === '\n' || char === '\r') { |
Rewrote the parser to use a character-by-character state machine instead of complex regex. This properly handles adjacent quoted and unquoted segments as a single argument, matching Bash behavior.
Examples of now-correct parsing:
--foo="bar"'baz'→--foo=barbaza" b"→a brun:silent["echo 1"]["echo 2"]→run:silent[echo 1][echo 2]Also fixed existing test cases that had incorrect expectations and added new test cases for combined quotation segments.
Fixes #20.
Fixes #23.