Skip to content

Conversation

@TooTallNate
Copy link

Rewrote the parser to use a character-by-character state machine instead of complex regex. This properly handles adjacent quoted and unquoted segments as a single argument, matching Bash behavior.

Examples of now-correct parsing:

  • --foo="bar"'baz'--foo=barbaz
  • a" b"a b
  • run:silent["echo 1"]["echo 2"]run:silent[echo 1][echo 2]

Also fixed existing test cases that had incorrect expectations and added new test cases for combined quotation segments.

Fixes #20.
Fixes #23.

Rewrote the parser to use a character-by-character state machine
instead of complex regex. This properly handles adjacent quoted
and unquoted segments as a single argument, matching Bash behavior.

Examples of now-correct parsing:
- `--foo="bar"'baz'` → `--foo=barbaz`
- `a" b"` → `a b`
- `run:silent["echo 1"]["echo 2"]` → `run:silent[echo 1][echo 2]`

Also fixed existing test cases that had incorrect expectations
and added new test cases for combined quotation segments.
Copilot AI review requested due to automatic review settings December 16, 2025 09:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the complex regex-based argument parser with a character-by-character state machine implementation to correctly handle adjacent quoted and unquoted segments as single arguments, matching Bash behavior.

Key Changes:

  • Rewrote the core parsing logic from regex to a state machine that tracks quote state and builds tokens character-by-character
  • Fixed test expectations to match the new parsing behavior where quotes are stripped from key-value pairs
  • Added new test cases for combined quotation segments

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
index.ts Replaced regex-based parser with state machine that processes input character-by-character, properly handling combined quoted/unquoted segments
test/Index.spec.js Updated test expectations to match new quote-stripping behavior, added trailing commas for style consistency, and added two new test cases for combined quotation segments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if (hasToken) {
myArray.push(current);
}

Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser doesn't handle unclosed quotes, which could lead to unexpected behavior. When a quote is opened but never closed, the entire rest of the input string will be consumed as part of the current token, including any subsequent arguments. Consider adding validation to detect unclosed quotes and either throw an error or handle them gracefully according to the desired behavior.

Suggested change
if (inQuote) {
throw new Error(
`Unclosed quote (${inQuote}) found in input string`
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +143 to +151
"value[echo][grep]+Peter's Friends",
]);
done();
});

it("combined quotation segments", function (done) {
parseAndValidate("--foo=\"bar\"'baz'", ["--foo=barbaz"]);
done();
});
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the new test cases cover combined quotation segments, there's no test coverage for important edge cases that could cause issues with the new parser implementation: unclosed quotes (e.g., --foo="bar), escaped quotes within strings (if supported), consecutive whitespace handling, and empty input strings. Adding tests for these scenarios would help ensure the parser handles them correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +48
let current = "";
let inQuote: string | null = null;
let hasToken = false; // Track if we've started a token (for empty quotes)
let i = 0;

// Accepts any number of arguments, and returns the first one that is a string
// (even an empty string)
function firstString(...args: Array<any>): string | undefined {
for (let i = 0; i < args.length; i++) {
const arg = args[i];
if (typeof arg === "string") {
return arg;
while (i < value.length) {
const char = value[i];

if (inQuote) {
// Inside quotes - look for the closing quote
if (char === inQuote) {
// End of quoted section
inQuote = null;
} else {
// Add character to current token (without the quotes)
current += char;
}
} else {
// Outside quotes
if (char === '"' || char === "'") {
// Start of quoted section - this means we have a token even if empty
inQuote = char;
hasToken = true;
} else if (/\s/.test(char)) {
// Whitespace - end current token if we have one
if (hasToken) {
myArray.push(current);
current = "";
hasToken = false;
}
} else {
// Regular character - add to current token
current += char;
hasToken = true;
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string concatenation with the += operator in a loop can be inefficient for large input strings because strings are immutable in JavaScript, causing potential O(n²) time complexity. Consider using an array to collect characters and join them at the end, or use a more efficient string building approach for better performance with large inputs.

Copilot uses AI. Check for mistakes.
// Start of quoted section - this means we have a token even if empty
inQuote = char;
hasToken = true;
} else if (/\s/.test(char)) {
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating and executing a regex on every character iteration is inefficient. Consider using a simple character comparison instead (e.g., char === ' ' || char === '\t' || char === '\n' || char === '\r') or move the regex outside the loop to avoid repeated regex compilation.

Suggested change
} else if (/\s/.test(char)) {
} else if (char === ' ' || char === '\t' || char === '\n' || char === '\r') {

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

separates unquoted tail of argument starting with quotes doesn't strip word-internal quotes

1 participant