Skip to content

fix(awk): check word boundary before emitting keyword tokens#859

Merged
chaliy merged 1 commit intomainfrom
fix/issue-852-awk-lexer-keyword-split
Mar 27, 2026
Merged

fix(awk): check word boundary before emitting keyword tokens#859
chaliy merged 1 commit intomainfrom
fix/issue-852-awk-lexer-keyword-split

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented Mar 27, 2026

Summary

  • Awk lexer now checks word boundaries before emitting keyword tokens
  • print_sp, printf_count, delete_flag etc. are correctly parsed as identifiers
  • Keywords are only emitted when followed by non-identifier characters

Test plan

  • New awk spec tests for keyword-prefixed identifiers
  • Full spec + awk test suite green

Closes #852

The awk lexer's matches_keyword() and is_keyword_at_pos() checked only
alphanumeric chars after a keyword candidate but not underscore. This
caused identifiers like print_sp, printf_count, delete_flag to be split
into a keyword + variable. Now both functions also reject '_' as a
continuation character, treating the whole token as an identifier.

Closes #852
@chaliy chaliy merged commit ef3148a into main Mar 27, 2026
23 checks passed
@chaliy chaliy deleted the fix/issue-852-awk-lexer-keyword-split branch March 27, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: awk lexer splits identifiers starting with keywords (print_spprint + _sp)

1 participant