This project aims to build an interpreter, initially for a subset of Scheme, iteratively adding features and eventually evolving into a distinct language with type safety and a Cranelift-based backend.
The goal is to learn about building a language along with all of the supporting tooling like formatter, language server, etc.
This list outlines the planned development steps, subject to change as the project evolves.
- Project Setup: Basic Rust binary project structure (
Cargo.toml,src/main.rs).- Module structure (
lexer.rs,types.rs,parser.rs,evaluator.rs,environment.rs). - Setup
lib.rsfor library structure.
- Module structure (
- Core Data Types (
types.rs): DefineSexprenum (Symbol, Number, Boolean, List, Nil).- Implement
Displayfor pretty-printingSexpr. - Add
Pair/Conscell representation (alternative/complement toVec<Sexpr>for lists). - Add
Stringtype. - Add
Vectortype (#( ... )). - Add
Charactertype (#\a). - Add
HashMaptype - Add
HashSettype
- Implement
- Lexer (
lexer.rs): Convert input string toTokenstream.- Handle basic tokens:
(,),', symbols, numbers (float), booleans (#t,#f). - Handle string literals (
"). - Handle whitespace and comments (
;). - Implement robust error handling (
LexerError). - Add comprehensive unit tests.
- Setup benchmarking with Criterion (
cargo bench). - Turn the lexer into a FSM
- Handle basic tokens:
- Parser (
parser.rs): ConvertTokenstream toSexpr(AST).- Parse atoms (symbols, numbers, booleans, strings).
- Parse lists (
(...)). - Handle quote sugar (
'expr->(quote expr)). - Handle quasiquote/unquote (
,,,@) later. - Implement robust error handling (
ParseError). - Add comprehensive unit tests.
- Add helpful explanations of errors, like Elm.
- Basic Environment (
environment.rs): Store variable bindings.- Implement nested environments (for lexical scoping).
- Define/get variables.
- Handle
set!for mutation.
- Evaluator (
evaluator.rs): ExecuteSexprAST.- Evaluate self-evaluating atoms (numbers, booleans, strings).
- Evaluate symbols (variable lookup in environment).
- Implement
quotespecial form. - Implement
ifspecial form. - Implement
beginspecial form. - Implement
definespecial form (global/local). - Implement
set!special form. - Implement
lambdaspecial form. - Implement
letspecial form. - Implement
let*special form. - Implement
letrecspecial form. - Implement basic procedure calls (primitives first).
- Implement robust error handling (
EvalError). - Implement
andandorspecial forms.
- Primitive Procedures: Implement core built-in functions.
- Basic arithmetic (
+,-,*,/). - Comparison (
=,<,>,<=,>=). (Note:=is numeric equal in Scheme) - Type predicates (
number?,symbol?,boolean?,list?,pair?,null?,procedure?). - List operations (
cons,car,cdr,list). - Lambda application (
apply) - Equality (
eq?,eqv?,equal?). - Add list mutators (
set-car!,set-cdr!).
- Basic arithmetic (
- REPL (
main.rsor separate module): Basic Read-Eval-Print Loop.- Read input line by line.
- Integrate Lexer, Parser, Evaluator.
- Print results or errors.
- Handle multiline input.
- Add history (using rustyline?).
- Add tab completion.
- Print clear errors showing where the error occured.
- Error Handling: Unified and user-friendly error reporting.
- Include source location/span information in errors.
- Consistent error types/display.
- Language Server Protocol (LSP) Implementation (
schreme-lsp/?): Provide IDE features.- Setup: Create a separate LSP binary crate.
- Communication: Implement JSON-RPC communication layer (e.g., using
lsp-server,tower-lsp). - Basic Diagnostics: Report lexer/parser errors (
textDocument/publishDiagnostics). Requires parsing on change. - Syntax Highlighting: Implement basic semantic token highlighting (
textDocument/semanticTokens). - Completions: Offer basic completions for keywords and primitives (
textDocument/completion). - Hover: Show basic type info for primitives (
textDocument/hover). - Advanced Features (Later):
- Go To Definition (requires environment analysis).
- Find References.
- Renaming.
- Diagnostics based on basic analysis (e.g., unbound variables).
- Basic Formatter (
schreme-fmt/?): Auto-format code.- Define formatting rules.
- Integrate with parser/AST.
- Command-line tool.
- Tree-sitter Parser (Exploration/Alternative):
- Evaluate existing
tree-sitter-scheme. - Consider creating
tree-sitter-schremeif syntax diverges significantly. - Explore using Tree-sitter within the LSP for faster, more resilient parsing for IDE features.
- Evaluate existing
- Debugger Implementation: Provide step-through debugging capabilities.
- Foundation & Control:
- Define Breakpoint structure (file/line/column or span).
- Implement setting/clearing breakpoints.
- Modify Evaluator loop/structure to check for breakpoints and pause execution.
- Implement execution control commands (Step In, Step Over, Step Out, Continue).
- State Inspection:
- Inspect current
Environmentbindings (variable names/values). - Implement call stack tracking during evaluation.
- Display the current call stack.
- Pretty-print
Sexpr/Nodevalues during inspection.
- Inspect current
- Interface/Protocol:
- Option A: Debug Adapter Protocol (DAP):
- Setup separate DAP server binary.
- Implement DAP communication (JSON-RPC).
- Handle DAP requests (setBreakpoints, configurationDone, threads, stackTrace, scopes, variables, continue, stepIn, etc.).
- Send DAP events (stopped, terminated, output, etc.).
- Option B: REPL Integration:
- Add debug-specific commands to the REPL (
break,step,continue,print,stack, etc.).
- Add debug-specific commands to the REPL (
- Option A: Debug Adapter Protocol (DAP):
- Cranelift Backend Debugging (Later):
- Generate Debug Information (DWARF?) mapping compiled code to source Spans during Cranelift compilation.
- Investigate integration with native debuggers (GDB/LLDB) via DWARF.
- Explore JIT debugging hooks if using
cranelift-jit.
- Foundation & Control:
- Profiler Implementation: Analyze performance characteristics.
- Data Collection Strategy:
- Option A: Instrumentation:
- Instrument procedure calls (entry/exit time/count) in the evaluator.
- Instrument primitive calls.
- Option B: Sampling:
- Periodically sample the interpreter's call stack.
- Choose and implement one or both strategies.
- Option A: Instrumentation:
- Metrics:
- Track time spent per function/primitive.
- Track call counts per function/primitive.
- (Optional) Track allocations per function.
- Interpreter Integration:
- Add hooks/wrappers around evaluation steps and function calls.
- Manage profiler state (enabled/disabled, data storage).
- Data Reporting:
- Implement basic text-based report (e.g., table sorted by time).
- Output data compatible with external tools (e.g.,
perfformat, flamegraph collapsed stack format). - Implement flame graph generation (e.g., using
inferno).
- Cranelift Backend Profiling (Later):
- Integrate with Cranelift/LLVM profiling features if applicable.
- Correlate compiled code performance metrics back to Schreme source functions.
- Data Collection Strategy:
- Testing Framework:
- Develop infrastructure for running Scheme code tests against the interpreter.
- Integrate with
cargo test.
- Package/Module System (
schreme-pkg/?): Manage dependencies and code organization.- Design: Define module syntax (
define-library, custom?). - Manifest: Define a package manifest file format (
Schreme.toml?). - Resolution: Implement dependency resolution logic.
- Fetching: Mechanism to fetch dependencies (Git, registry?).
- Build Integration: Integrate package management into the execution flow.
- Design: Define module syntax (
- Lambda & Closures: Implement user-defined procedures.
-
lambdaspecial form. - Proper lexical scoping (capture environment).
- Procedure call evaluation for user functions.
-
- Tail Call Optimization (TCO): Essential for idiomatic Scheme recursion.
- Implement TCO within the tree-walking evaluator (Trampoline, or direct jump).
- Macros: Implement hygienic macros (
syntax-rules).-
define-syntax,syntax-rules. - Expansion mechanism.
- Consider
syntax-caselater for more power.
-
- Continuations: Implement
call/cc(call-with-current-continuation). (Challenging but powerful). - More Data Types & Primitives:
- Vectors (
vector,vector-ref,vector-set!, etc.). - Characters (
char?,char=?, etc.). - Ports (Input/Output:
open-input-file,read,write,display). - Hash Tables.
- Vectors (
- File Execution: Allow running
.ss/.scmfiles directly. - Garbage Collection (GC): Crucial for managing memory, especially with closures and continuations.
- Choose a GC strategy (Reference Counting, Mark-Sweep, Generational).
- Integrate GC with Rust data structures (using libraries like
gc-arena,bacon-rajan-cc, or custom).
- Intermediate Representation (IR): Design an IR suitable for compilation.
- Possibly a typed IR distinct from
Sexpr. - SSA (Static Single Assignment) form?
- Possibly a typed IR distinct from
- Lowering: Translate
SexprAST (or a high-level IR) to the chosen IR. - Cranelift Integration:
- Basic setup: Link Cranelift, create
Context,Module. - Function compilation: Translate IR functions to Cranelift IR (
cranelift-frontend). - JIT Execution: Compile and execute functions on the fly (
cranelift-jit). - AOT Compilation: Option to compile to object files (
cranelift-object).
- Basic setup: Link Cranelift, create
- Runtime System: Support code generated by Cranelift.
- Memory management integration (calling GC).
- Primitive function implementation callable from compiled code.
- Calling conventions between interpreted and compiled code.
- Exception/Continuation handling integration.
- Optimization: Apply optimization passes within Cranelift.
- Static Typing: Design and implement a type system.
- Type syntax.
- Type checking algorithm.
- Type inference?
- Integrate type checking into the evaluator or compilation pipeline.
- Update LSP for type information and errors.
- Syntax Changes: Evolve the language syntax away from pure S-expressions if desired.
- This might require significant parser changes (potentially favouring Tree-sitter).
- Feature Modification/Removal: Change or remove Scheme features that don't fit the vision for Schreme.
- Foreign Function Interface (FFI): Define how to call Rust/C code from Schreme and vice-versa.
- Concurrency/Parallelism: Design and implement features for concurrent execution (actors, futures?).
- Documentation: Comprehensive user guides, language reference, API docs.
- Examples: Provide clear examples of how to use Schreme.
- Refinement: Code cleanup, address TODOs/FIXMEs.
- Further Benchmarking and Optimization.
- Release Strategy: Versioning, packaging.
- Maintain comprehensive unit and integration tests.
- Keep dependencies updated.
- Refactor code for clarity and maintainability.
- Write documentation alongside features.
- Continuously benchmark performance bottlenecks.