Skip to content

Conversation

@BegoniaHe
Copy link
Member

This pull request establishes the initial foundation for the CZC compiler project, introducing its core build system, command-line interface (CLI) architecture, and lexer (lexical analysis) functionality. It sets up modern C++23 infrastructure, organizes the codebase for extensibility, and provides the first working CLI commands (lex and version). The project is structured for future pipeline expansion and includes comprehensive documentation and configuration.

The most important changes are:

Project Initialization and Configuration

  • Added a CMakeLists.txt build system supporting C++23, code coverage, multiple platforms, and integration of third-party dependencies (CLI11, glaze, tomlplusplus, GoogleTest, ICU). Also includes test targets for the lexer.
  • Introduced a .gitmodules file to add the test/testcases submodule for test cases.
  • Added VSCode settings and project-specific configuration files for branch/tag conventions and change tracking. [1] [2]

Core CLI and Command Architecture

  • Implemented the CLI facade (Cli class in cli.hpp) using the facade pattern to manage command registration, global options, and command execution, with support for extensible commands and pipeline phases.
  • Defined a generic Command interface for all CLI subcommands, enforcing single-responsibility and extensibility.
  • Established a CompilerPhase interface to support future pipeline composition of compiler stages.
  • Introduced a layered CLI options structure, supporting global, phase-specific, and output options, with enums for output format and log level.

Implemented CLI Commands

  • Added the lex command (LexCommand), which performs lexical analysis on source files, supports trivia mode, multiple output formats (text/JSON), and integrates with the pipeline interface.
  • Added the version command (VersionCommand), which displays compiler version and build information.
  • Created the CLI entry point in apps/czc/main.cpp, delegating to the Cli facade.

Change Tracking and Documentation

  • Included .changes markdown files to document major features, initial commit, and Makefile fixes for project tracking. [1] [2] [3]

(References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

- Implemented comprehensive unit tests for the StringScanner class, covering various string types including regular, raw, and TeX strings, as well as escape sequences and error handling.
- Added unit tests for Token-related functionalities, including SourceLocation, Trivia, TokenSpan, and token management.
- Developed unit tests for UTF-8 utility functions, validating character decoding, encoding, and string validity checks.
- Updated test cases to ensure robust coverage of edge cases and error scenarios.
Copilot AI review requested due to automatic review settings November 30, 2025 14:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR establishes the foundational infrastructure for the CZC compiler, introducing a complete lexical analysis system with modern C++23 features, a CLI framework, and comprehensive test coverage. The implementation provides essential compiler components including UTF-8 support, source code management, token generation, and multiple output formats.

Key Changes

  • Implemented a complete lexer with support for identifiers, keywords, operators, numbers, strings (normal/raw/TeX), and comments
  • Established CLI infrastructure with lex and version commands supporting text and JSON output formats
  • Created comprehensive source code management system with buffer tracking and UTF-8 handling

Reviewed changes

Copilot reviewed 62 out of 63 changed files in this pull request and generated no comments.

Show a summary per file
File Description
CMakeLists.txt Build system configuration with C++23, dependencies (CLI11, glaze, ICU), and test targets
include/czc/lexer/*.hpp Lexer header files defining token types, scanners, UTF-8 utilities, and source management
src/lexer/*.cpp Lexer implementation files for token scanning, UTF-8 handling, and source reading
src/cli/*.cpp CLI implementation including command framework, formatters, and option handling
test/lexer/*_test.cpp Comprehensive unit tests for all lexer components
apps/czc/main.cpp Main entry point delegating to CLI facade
Comments suppressed due to low confidence (2)

include/czc/lexer/token.hpp:1

  • Corrected spelling of '预留未来扩展' to '预留未来扩展' in comment. The Chinese text appears correct but the spacing between words should be consistent with the English comments.
    src/lexer/string_scanner.cpp:1
  • The parameter 'count' is described as 'maximum number of digits to skip' but the function name and usage suggest it skips exactly 'count' hex digits, not a maximum. The documentation should clarify that it attempts to skip up to 'count' digits but may skip fewer if non-hex characters are encountered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

BegoniaHe and others added 7 commits November 30, 2025 17:23
…anagement

- Added CompilerContext to encapsulate global options, output options, and diagnostics.
- Introduced Driver class to manage the compilation process, including the execution of the lexer phase.
- Enhanced diagnostics system to report errors and warnings during compilation.
- Implemented LexerPhase to handle lexical analysis with options for preserving trivia and error reporting.
- Updated tests to cover all token types and ensure correct naming in diagnostics.
- Refactored existing code for better organization and maintainability.
- Implement comprehensive unit tests for the token-related functionalities in `token_test.cpp`, covering source locations, trivia, token spans, and various token types.
- Introduce unit tests for UTF-8 utility functions in `utf8_test.cpp`, validating character decoding, encoding, validity checks, and character counting.
- Ensure tests cover edge cases, including invalid UTF-8 sequences and mixed content strings.
- Added diagnostic types and level-to-string conversion in `diagnostic.cpp`.
- Implemented ANSI color rendering in `ansi_renderer.cpp` for various diagnostic levels.
- Created JSON emitter in `json_emitter.cpp` to output diagnostics in JSON format.
- Developed text emitter in `text_emitter.cpp` for plain text output of diagnostics.
- Introduced error code registration and lookup in `error_code.cpp`.
- Implemented internationalization support in `i18n.cpp` for localized error messages.
- Added message handling with Markdown parsing in `message.cpp`.
- Created source span abstraction in `span.cpp` for tracking source code locations.
- Registered lexer error codes in `lexer_error_codes.cpp` for better error reporting.
- Implemented lexer source locator in `lexer_source_locator.cpp` to map errors to source locations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant