init commit #1

BegoniaHe · 2025-11-30T14:31:44Z

This pull request establishes the initial foundation for the CZC compiler project, introducing its core build system, command-line interface (CLI) architecture, and lexer (lexical analysis) functionality. It sets up modern C++23 infrastructure, organizes the codebase for extensibility, and provides the first working CLI commands (lex and version). The project is structured for future pipeline expansion and includes comprehensive documentation and configuration.

The most important changes are:

Project Initialization and Configuration

Added a CMakeLists.txt build system supporting C++23, code coverage, multiple platforms, and integration of third-party dependencies (CLI11, glaze, tomlplusplus, GoogleTest, ICU). Also includes test targets for the lexer.
Introduced a .gitmodules file to add the test/testcases submodule for test cases.
Added VSCode settings and project-specific configuration files for branch/tag conventions and change tracking. [1] [2]

Core CLI and Command Architecture

Implemented the CLI facade (Cli class in cli.hpp) using the facade pattern to manage command registration, global options, and command execution, with support for extensible commands and pipeline phases.
Defined a generic Command interface for all CLI subcommands, enforcing single-responsibility and extensibility.
Established a CompilerPhase interface to support future pipeline composition of compiler stages.
Introduced a layered CLI options structure, supporting global, phase-specific, and output options, with enums for output format and log level.

Implemented CLI Commands

Added the lex command (LexCommand), which performs lexical analysis on source files, supports trivia mode, multiple output formats (text/JSON), and integrates with the pipeline interface.
Added the version command (VersionCommand), which displays compiler version and build information.
Created the CLI entry point in apps/czc/main.cpp, delegating to the Cli facade.

Change Tracking and Documentation

Included .changes markdown files to document major features, initial commit, and Makefile fixes for project tracking. [1] [2] [3]

(References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

- Implemented comprehensive unit tests for the StringScanner class, covering various string types including regular, raw, and TeX strings, as well as escape sequences and error handling. - Added unit tests for Token-related functionalities, including SourceLocation, Trivia, TokenSpan, and token management. - Developed unit tests for UTF-8 utility functions, validating character decoding, encoding, and string validity checks. - Updated test cases to ensure robust coverage of edge cases and error scenarios.

Copilot

Pull request overview

This PR establishes the foundational infrastructure for the CZC compiler, introducing a complete lexical analysis system with modern C++23 features, a CLI framework, and comprehensive test coverage. The implementation provides essential compiler components including UTF-8 support, source code management, token generation, and multiple output formats.

Key Changes

Implemented a complete lexer with support for identifiers, keywords, operators, numbers, strings (normal/raw/TeX), and comments
Established CLI infrastructure with lex and version commands supporting text and JSON output formats
Created comprehensive source code management system with buffer tracking and UTF-8 handling

Reviewed changes

Copilot reviewed 62 out of 63 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
CMakeLists.txt	Build system configuration with C++23, dependencies (CLI11, glaze, ICU), and test targets
include/czc/lexer/*.hpp	Lexer header files defining token types, scanners, UTF-8 utilities, and source management
src/lexer/*.cpp	Lexer implementation files for token scanning, UTF-8 handling, and source reading
src/cli/*.cpp	CLI implementation including command framework, formatters, and option handling
test/lexer/*_test.cpp	Comprehensive unit tests for all lexer components
apps/czc/main.cpp	Main entry point delegating to CLI facade

Comments suppressed due to low confidence (2)

include/czc/lexer/token.hpp:1

Corrected spelling of '预留未来扩展' to '预留未来扩展' in comment. The Chinese text appears correct but the spacing between words should be consistent with the English comments.
src/lexer/string_scanner.cpp:1
The parameter 'count' is described as 'maximum number of digits to skip' but the function name and usage suggest it skips exactly 'count' hex digits, not a maximum. The documentation should clarify that it attempts to skip up to 'count' digits but may skip fewer if non-hex characters are encountered.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

…anagement - Added CompilerContext to encapsulate global options, output options, and diagnostics. - Introduced Driver class to manage the compilation process, including the execution of the lexer phase. - Enhanced diagnostics system to report errors and warnings during compilation. - Implemented LexerPhase to handle lexical analysis with options for preserving trivia and error reporting. - Updated tests to cover all token types and ensure correct naming in diagnostics. - Refactored existing code for better organization and maintainability.

- Implement comprehensive unit tests for the token-related functionalities in `token_test.cpp`, covering source locations, trivia, token spans, and various token types. - Introduce unit tests for UTF-8 utility functions in `utf8_test.cpp`, validating character decoding, encoding, validity checks, and character counting. - Ensure tests cover edge cases, including invalid UTF-8 sequences and mixed content strings.

- Added diagnostic types and level-to-string conversion in `diagnostic.cpp`. - Implemented ANSI color rendering in `ansi_renderer.cpp` for various diagnostic levels. - Created JSON emitter in `json_emitter.cpp` to output diagnostics in JSON format. - Developed text emitter in `text_emitter.cpp` for plain text output of diagnostics. - Introduced error code registration and lookup in `error_code.cpp`. - Implemented internationalization support in `i18n.cpp` for localized error messages. - Added message handling with Markdown parsing in `message.cpp`. - Created source span abstraction in `span.cpp` for tracking source code locations. - Registered lexer error codes in `lexer_error_codes.cpp` for better error reporting. - Implemented lexer source locator in `lexer_source_locator.cpp` to map errors to source locations.

…lator

BegoniaHe added 4 commits November 29, 2025 10:10

feat: init commit

664e0ea

fix: update project version to 0.0.1 and improve vcpkg.json generation

3041076

feat: add submodule and lexer

3b1d619

Copilot AI review requested due to automatic review settings November 30, 2025 14:31

Copilot AI reviewed Nov 30, 2025

View reviewed changes

BegoniaHe and others added 7 commits November 30, 2025 17:23

chore: update submodule path for lexer test cases

a3400dc

chore: update submodule configuration for test cases

bee4000

chore: remove obsolete testcases submodule

d2a4ac6

feat(i18n): Add i18n support and unit tests for DiagContext and Trans…

34c8103

…lator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

init commit #1

init commit #1

Uh oh!

BegoniaHe commented Nov 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

init commit #1

Are you sure you want to change the base?

init commit #1

Uh oh!

Conversation

BegoniaHe commented Nov 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant