-
Notifications
You must be signed in to change notification settings - Fork 1
init commit #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
init commit #1
Conversation
- Implemented comprehensive unit tests for the StringScanner class, covering various string types including regular, raw, and TeX strings, as well as escape sequences and error handling. - Added unit tests for Token-related functionalities, including SourceLocation, Trivia, TokenSpan, and token management. - Developed unit tests for UTF-8 utility functions, validating character decoding, encoding, and string validity checks. - Updated test cases to ensure robust coverage of edge cases and error scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR establishes the foundational infrastructure for the CZC compiler, introducing a complete lexical analysis system with modern C++23 features, a CLI framework, and comprehensive test coverage. The implementation provides essential compiler components including UTF-8 support, source code management, token generation, and multiple output formats.
Key Changes
- Implemented a complete lexer with support for identifiers, keywords, operators, numbers, strings (normal/raw/TeX), and comments
- Established CLI infrastructure with
lexandversioncommands supporting text and JSON output formats - Created comprehensive source code management system with buffer tracking and UTF-8 handling
Reviewed changes
Copilot reviewed 62 out of 63 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Build system configuration with C++23, dependencies (CLI11, glaze, ICU), and test targets |
| include/czc/lexer/*.hpp | Lexer header files defining token types, scanners, UTF-8 utilities, and source management |
| src/lexer/*.cpp | Lexer implementation files for token scanning, UTF-8 handling, and source reading |
| src/cli/*.cpp | CLI implementation including command framework, formatters, and option handling |
| test/lexer/*_test.cpp | Comprehensive unit tests for all lexer components |
| apps/czc/main.cpp | Main entry point delegating to CLI facade |
Comments suppressed due to low confidence (2)
include/czc/lexer/token.hpp:1
- Corrected spelling of '预留未来扩展' to '预留未来扩展' in comment. The Chinese text appears correct but the spacing between words should be consistent with the English comments.
src/lexer/string_scanner.cpp:1 - The parameter 'count' is described as 'maximum number of digits to skip' but the function name and usage suggest it skips exactly 'count' hex digits, not a maximum. The documentation should clarify that it attempts to skip up to 'count' digits but may skip fewer if non-hex characters are encountered.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
…anagement - Added CompilerContext to encapsulate global options, output options, and diagnostics. - Introduced Driver class to manage the compilation process, including the execution of the lexer phase. - Enhanced diagnostics system to report errors and warnings during compilation. - Implemented LexerPhase to handle lexical analysis with options for preserving trivia and error reporting. - Updated tests to cover all token types and ensure correct naming in diagnostics. - Refactored existing code for better organization and maintainability.
- Implement comprehensive unit tests for the token-related functionalities in `token_test.cpp`, covering source locations, trivia, token spans, and various token types. - Introduce unit tests for UTF-8 utility functions in `utf8_test.cpp`, validating character decoding, encoding, validity checks, and character counting. - Ensure tests cover edge cases, including invalid UTF-8 sequences and mixed content strings.
- Added diagnostic types and level-to-string conversion in `diagnostic.cpp`. - Implemented ANSI color rendering in `ansi_renderer.cpp` for various diagnostic levels. - Created JSON emitter in `json_emitter.cpp` to output diagnostics in JSON format. - Developed text emitter in `text_emitter.cpp` for plain text output of diagnostics. - Introduced error code registration and lookup in `error_code.cpp`. - Implemented internationalization support in `i18n.cpp` for localized error messages. - Added message handling with Markdown parsing in `message.cpp`. - Created source span abstraction in `span.cpp` for tracking source code locations. - Registered lexer error codes in `lexer_error_codes.cpp` for better error reporting. - Implemented lexer source locator in `lexer_source_locator.cpp` to map errors to source locations.
This pull request establishes the initial foundation for the CZC compiler project, introducing its core build system, command-line interface (CLI) architecture, and lexer (lexical analysis) functionality. It sets up modern C++23 infrastructure, organizes the codebase for extensibility, and provides the first working CLI commands (
lexandversion). The project is structured for future pipeline expansion and includes comprehensive documentation and configuration.The most important changes are:
Project Initialization and Configuration
CMakeLists.txtbuild system supporting C++23, code coverage, multiple platforms, and integration of third-party dependencies (CLI11, glaze, tomlplusplus, GoogleTest, ICU). Also includes test targets for the lexer..gitmodulesfile to add thetest/testcasessubmodule for test cases.Core CLI and Command Architecture
Cliclass incli.hpp) using the facade pattern to manage command registration, global options, and command execution, with support for extensible commands and pipeline phases.Commandinterface for all CLI subcommands, enforcing single-responsibility and extensibility.CompilerPhaseinterface to support future pipeline composition of compiler stages.Implemented CLI Commands
lexcommand (LexCommand), which performs lexical analysis on source files, supports trivia mode, multiple output formats (text/JSON), and integrates with the pipeline interface.versioncommand (VersionCommand), which displays compiler version and build information.apps/czc/main.cpp, delegating to theClifacade.Change Tracking and Documentation
.changesmarkdown files to document major features, initial commit, and Makefile fixes for project tracking. [1] [2] [3](References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]