-
Notifications
You must be signed in to change notification settings - Fork 0
Add codemode: typechecked TypeScript execution with tools #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
| if (newPersistedInputs.length > 0) { | ||
| const allEvents = [...existingEvents, ...newPersistedInputs] | ||
| const allEvents = [...eventsWithPrompt, ...newPersistedInputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: System prompt permanently overwritten when codemode enabled
The ensureCodemodePrompt function replaces the user's original system prompt with CODEMODE_SYSTEM_PROMPT, and this modified event list gets persisted to disk via repo.save(). This permanently overwrites the user's custom system prompt in the context file, even though the codemode prompt replacement is likely intended only for the current LLM request, not for permanent storage. Subsequent loads of the context will return the codemode prompt instead of the user's original.
| }) | ||
| yield* persistEvent(result) | ||
| return result as ContextOrCodemodeEvent | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Multiple codeblocks lose earlier output on later failure
When processing multiple codeblocks in a single assistant response, the tracking variables (stdout, stderr, exitCode, typecheckFailed) accumulate across all blocks. If an early block executes successfully but a later block fails typechecking, the typecheckFailed path at line 175-185 discards all accumulated stdout with stdout: "", losing output from the successful earlier blocks. Similarly, exitCode only captures the last ExecutionComplete event, so if an earlier block fails (exit code 1) but a later block succeeds (exit code 0), the failure is masked.
e3a539c to
e560d87
Compare
| }) | ||
| ) | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: No execution timeout allows infinite loops to hang agent
The subprocess execution in CodeExecutor has no timeout mechanism. The stream waits indefinitely for process.exitCode to resolve. If LLM-generated code contains an infinite loop (e.g., while(true){}), a blocking operation that never completes, or code that hangs waiting for input, the agent will hang forever with no way to recover except killing the process. This could make the agent unresponsive until manually terminated.
| // Stream SSE events directly - provide services to remove context requirements | ||
| // Filter to InputEvent only (exclude SystemPromptEvent which isn't an InputEvent) | ||
| const isUserMessage = (e: ScriptInputEvent): e is UserMessageEvent => Schema.is(UserMessageEvent)(e) | ||
| const events: Array<InputEvent> = parsedEvents.filter(isUserMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: HTTP handler drops valid FileAttachmentEvent inputs
The HTTP event filter only keeps UserMessageEvent using isUserMessage, but InputEvent includes FileAttachmentEvent and CodemodeResultEvent as valid input types. File attachment events would be dropped even if they were properly parsed. Additionally, ScriptInputEvent in server.service.ts only unions UserMessageEvent and SystemPromptEvent, preventing FileAttachmentEvent from being accepted via HTTP at all, making the HTTP endpoint unable to handle file attachments that the CLI supports.
Additional Locations (1)
df65165 to
63e8675
Compare
|
|
||
| if (result._tag === "completed") { | ||
| if (needsContinuation) { | ||
| return yield* runAgentContinuation(contextName, contextService, chat, mailbox) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing iteration limit allows unbounded agent loop recursion
The runAgentContinuation function recursively calls itself at line 252 without any iteration limit, unlike the CLI's runEventStream which uses MAX_AGENT_LOOP_ITERATIONS = 15. This unbounded recursion occurs whenever needsContinuation is true, which happens when CodemodeResultEvent or CodemodeValidationErrorEvent has triggerAgentTurn === "after-current-turn". If the LLM consistently produces typechecking errors, outputs to stdout via console.log(), or fails to include <codemode> tags, the agent loop will recurse indefinitely, potentially causing stack overflow or resource exhaustion in the interactive chat UI.
Additional Locations (1)
Implements the core services needed for codemode functionality: - codemode.model.ts: Event schemas and <codemode> block parsing - codemode.repository.ts: Stores generated code in timestamped directories - typechecker.service.ts: TypeScript compiler API wrapper for validation - code-executor.service.ts: Bun subprocess execution with streaming output - codemode.service.ts: Orchestrates parse/store/typecheck/execute workflow Also adds error types, wires layers into main.ts, and updates vitest config to include colocated tests in src/. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add --codemode / -x flag to enable code block processing - Add handleCodemodeEvent for colored terminal output - Refactor codemode.service to use Stream.unwrap for cleaner control flow - Fix typecheck failure handling to properly stop execution - Add E2E tests proving the full pipeline works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add CodemodeResultEvent with toLLMMessage() returning user role - Update PersistedEvent union to include CodemodeResultEvent - Update eventsToPrompt to handle CodemodeResult as user message - Integrate codemode execution into ContextService.addEvents - CodemodeResult is now persisted and included in next LLM request - Refactor CLI to use unified event handling The codemode workflow now: 1. Assistant responds with <codemode> blocks 2. Code is parsed, typechecked, and executed 3. stdout/stderr captured as CodemodeResultEvent 4. Event persisted to context 5. Next LLM request includes it as user message 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add getSecret tool to codemode tools interface for retrieving secrets hidden from LLM (implementation in code-executor.service.ts) - Implement CodemodeResult return type with endTurn and data fields - Add agent loop that re-calls LLM when endTurn=false - Create CODEMODE_SYSTEM_PROMPT explaining tools and agent loop - Context service swaps in codemode prompt when -x flag is used - Add e2e tests for getSecret and CodemodeResult parsing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove codemodeOption from CLI - Always pass codemode: true to ContextService.addEvents - Update test to expect output contains "i" (via tools.log) rather than exact match 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The generated codemode files use `Promise<CodemodeResult>` in their function signature but the import only included `Tools`. Added `CodemodeResult` to the import statement. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Made system prompt more emphatic about requiring explicit type annotations (noImplicitAny is enabled) - When typecheck fails, create CodemodeResultEvent with endTurn=false so the LLM can see the errors and retry with fixed code 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The system prompt now explicitly states that Tools and CodemodeResult are automatically available (imports are auto-prepended), preventing LLMs from adding duplicate imports that cause TypeScript errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add triggerAgentTurn enum property ("after-current-turn" | "never") to all persisted events
- LLM triggered by any event with triggerAgentTurn="after-current-turn", not just UserMessage
- Remove endTurn and data fields from CodemodeResultEvent
- Add t.sendMessage() tool for user-facing output (stderr, no agent turn)
- Add t.fetch() tool for web requests
- console.log() triggers agent turn; sendMessage() doesn't
- Simplify system prompt: most tasks are single-turn
- All 75 tests pass
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
Internalize Scope in CodeExecutor by using Stream.unwrapScoped to manage subprocess lifecycle. This removes Scope from all public interfaces making the stream easier to consume. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add `mini-agent codemode run <path>` CLI command for executing codemode blocks - Refactor CodeExecutor to call CLI command instead of inline bun -e - Move agent loop from ContextService to CLI layer for cleaner separation - Increase max agent loop iterations from 3 to 15 - Add utility tools from kathmandu: calculate, now, sleep - Add __CODEMODE_RESULT__ marker for cleaner stdout parsing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Dead code - appendLog was never called. Codeblock directories now only contain: - index.ts (generated code) - types.ts (tool type definitions) - tsconfig.json (TypeScript config) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
These files were accidentally committed during development and should not be tracked in the repository. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- chat-ui.ts now uses contextService.addEvents with codemode: true instead of calling streamLLMResponse directly - Add CodemodeValidationErrorEvent to detect when LLM doesn't output codemode tags, triggering retry with chastising error message - Add feed item renderers for CodemodeResult and CodemodeValidationError in opentui-chat.tsx - Update llm.ts to include CodemodeValidationErrorEvent in prompt conversion - Add agent continuation loop in chat-ui for codemode follow-ups 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
63e8675 to
8f32ec5
Compare
| while ( | ||
| lastCodemodeResult && | ||
| lastCodemodeResult.triggerAgentTurn === "after-current-turn" && | ||
| iteration < MAX_AGENT_LOOP_ITERATIONS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: CLI agent loop ignores validation error retry trigger
The agent loop in runEventStream only tracks lastCodemodeResult (CodemodeResultEvent) but ignores CodemodeValidationErrorEvent. When the LLM responds without codemode tags, a CodemodeValidationErrorEvent is emitted and persisted with triggerAgentTurn: "after-current-turn", but since lastCodemodeResult remains undefined, the while loop condition fails and the agent never retries. The chat-ui.ts correctly handles this via triggersContinuation checking both event types.
Summary
Adds codemode execution to the agent: the LLM can write TypeScript code blocks that are typechecked and executed, with access to tools for file I/O, shell commands, web fetching, and secrets.
Features
Code Execution
<codemode>...</codemode>blocks are typechecked with strict TypeScriptAvailable Tools
t.sendMessage(msg)— show message to USER (no agent turn)t.readFile(path)/t.writeFile(path, content)— file I/Ot.exec(command)— run shell commandst.fetch(url)— HTTP requestst.getSecret(name)— access secrets (hidden from LLM)Agent Loop
console.log()output triggers another agent turnt.sendMessage()output goes to user, no turnEvent Model
triggerAgentTurnproperty to all eventstriggerAgentTurn="after-current-turn"CodemodeResultEventpersisted with stdout/stderr for conversation historyFiles Added
src/codemode.model.ts— code block parsing, event typessrc/codemode.service.ts— orchestrates parse → typecheck → executesrc/codemode.repository.ts— stores generated code filessrc/code-executor.service.ts— runs code in Bun subprocesssrc/typechecker.service.ts— TypeScript type checkingtest/codemode.e2e.test.ts— end-to-end testsAll 75 tests pass.
Note
Adds codemode enabling LLM-emitted TypeScript to be typechecked and executed via Bun with tools, persisting results to drive an agent loop across CLI/UI/HTTP and a new per-context storage layout.
codemodepipeline: parse<codemode>blocks, typecheck (TS), execute via Bun subprocess, stream events.CodemodeService,TypecheckService,CodeExecutor,CodemodeRepository.CodemodeResultEvent,CodemodeValidationErrorEventand codemode streaming events; addtriggerAgentTurnto events.ContextService.addEventssupports codemode and emits codemode events; swaps in codemode system prompt when enabled.triggerAgentTurnwith persistedCodemodeResult.codemode runsubcommand; integrate codemode intochatflow with an agent loop (iteration cap) and colored streaming output.ContextOrCodemodeEvent.events.yaml; repository APIs updated (getContextDir, listing by directories).src/**/*.test.ts..mini-agent/**.Written by Cursor Bugbot for commit 8f32ec5. This will update automatically on new commits. Configure here.