Skip to content

Conversation

@jonastemplestein
Copy link
Contributor

@jonastemplestein jonastemplestein commented Dec 5, 2025

Summary

Adds codemode execution to the agent: the LLM can write TypeScript code blocks that are typechecked and executed, with access to tools for file I/O, shell commands, web fetching, and secrets.

Features

Code Execution

  • <codemode>...</codemode> blocks are typechecked with strict TypeScript
  • Executed in a Bun subprocess with streamed stdout/stderr
  • Type errors trigger agent retry (agent sees errors and can fix)

Available Tools

  • t.sendMessage(msg) — show message to USER (no agent turn)
  • t.readFile(path) / t.writeFile(path, content) — file I/O
  • t.exec(command) — run shell commands
  • t.fetch(url) — HTTP requests
  • t.getSecret(name) — access secrets (hidden from LLM)

Agent Loop

  • console.log() output triggers another agent turn
  • t.sendMessage() output goes to user, no turn
  • Most tasks complete in one turn
  • Max 3 iterations before forced stop

Event Model

  • Added triggerAgentTurn property to all events
  • LLM triggered by any event with triggerAgentTurn="after-current-turn"
  • CodemodeResultEvent persisted with stdout/stderr for conversation history

Files Added

  • src/codemode.model.ts — code block parsing, event types
  • src/codemode.service.ts — orchestrates parse → typecheck → execute
  • src/codemode.repository.ts — stores generated code files
  • src/code-executor.service.ts — runs code in Bun subprocess
  • src/typechecker.service.ts — TypeScript type checking
  • test/codemode.e2e.test.ts — end-to-end tests

All 75 tests pass.


Note

Adds codemode enabling LLM-emitted TypeScript to be typechecked and executed via Bun with tools, persisting results to drive an agent loop across CLI/UI/HTTP and a new per-context storage layout.

  • Codemode (core):
    • Add codemode pipeline: parse <codemode> blocks, typecheck (TS), execute via Bun subprocess, stream events.
    • New services: CodemodeService, TypecheckService, CodeExecutor, CodemodeRepository.
    • New events: CodemodeResultEvent, CodemodeValidationErrorEvent and codemode streaming events; add triggerAgentTurn to events.
    • LLM prompt building updated to include codemode results/validation.
  • Context/Agent:
    • ContextService.addEvents supports codemode and emits codemode events; swaps in codemode system prompt when enabled.
    • Agent continuation driven by triggerAgentTurn with persisted CodemodeResult.
  • CLI/UI:
    • Add codemode run subcommand; integrate codemode into chat flow with an agent loop (iteration cap) and colored streaming output.
    • TUI updated to render codemode results/validation and ignore ephemeral codemode stream events.
  • HTTP/Adapters:
    • SSE streaming and LayerCode adapter generalized to ContextOrCodemodeEvent.
  • Persistence:
    • Contexts now stored per-directory with events.yaml; repository APIs updated (getContextDir, listing by directories).
  • Tests/Config:
    • Add E2E and unit tests for codemode; vitest config includes src/**/*.test.ts.
    • ESLint ignores .mini-agent/**.

Written by Cursor Bugbot for commit 8f32ec5. This will update automatically on new commits. Configure here.


if (newPersistedInputs.length > 0) {
const allEvents = [...existingEvents, ...newPersistedInputs]
const allEvents = [...eventsWithPrompt, ...newPersistedInputs]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: System prompt permanently overwritten when codemode enabled

The ensureCodemodePrompt function replaces the user's original system prompt with CODEMODE_SYSTEM_PROMPT, and this modified event list gets persisted to disk via repo.save(). This permanently overwrites the user's custom system prompt in the context file, even though the codemode prompt replacement is likely intended only for the current LLM request, not for permanent storage. Subsequent loads of the context will return the codemode prompt instead of the user's original.

Fix in Cursor Fix in Web

})
yield* persistEvent(result)
return result as ContextOrCodemodeEvent
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Multiple codeblocks lose earlier output on later failure

When processing multiple codeblocks in a single assistant response, the tracking variables (stdout, stderr, exitCode, typecheckFailed) accumulate across all blocks. If an early block executes successfully but a later block fails typechecking, the typecheckFailed path at line 175-185 discards all accumulated stdout with stdout: "", losing output from the successful earlier blocks. Similarly, exitCode only captures the last ExecutionComplete event, so if an earlier block fails (exit code 1) but a later block succeeds (exit code 0), the failure is masked.

Fix in Cursor Fix in Web

@jonastemplestein jonastemplestein changed the title Replace endTurn with triggerAgentTurn Add codemode: typechecked TypeScript execution with tools Dec 5, 2025
})
)
)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: No execution timeout allows infinite loops to hang agent

The subprocess execution in CodeExecutor has no timeout mechanism. The stream waits indefinitely for process.exitCode to resolve. If LLM-generated code contains an infinite loop (e.g., while(true){}), a blocking operation that never completes, or code that hangs waiting for input, the agent will hang forever with no way to recover except killing the process. This could make the agent unresponsive until manually terminated.

Fix in Cursor Fix in Web

// Stream SSE events directly - provide services to remove context requirements
// Filter to InputEvent only (exclude SystemPromptEvent which isn't an InputEvent)
const isUserMessage = (e: ScriptInputEvent): e is UserMessageEvent => Schema.is(UserMessageEvent)(e)
const events: Array<InputEvent> = parsedEvents.filter(isUserMessage)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: HTTP handler drops valid FileAttachmentEvent inputs

The HTTP event filter only keeps UserMessageEvent using isUserMessage, but InputEvent includes FileAttachmentEvent and CodemodeResultEvent as valid input types. File attachment events would be dropped even if they were properly parsed. Additionally, ScriptInputEvent in server.service.ts only unions UserMessageEvent and SystemPromptEvent, preventing FileAttachmentEvent from being accepted via HTTP at all, making the HTTP endpoint unable to handle file attachments that the CLI supports.

Additional Locations (1)

Fix in Cursor Fix in Web


if (result._tag === "completed") {
if (needsContinuation) {
return yield* runAgentContinuation(contextName, contextService, chat, mailbox)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing iteration limit allows unbounded agent loop recursion

The runAgentContinuation function recursively calls itself at line 252 without any iteration limit, unlike the CLI's runEventStream which uses MAX_AGENT_LOOP_ITERATIONS = 15. This unbounded recursion occurs whenever needsContinuation is true, which happens when CodemodeResultEvent or CodemodeValidationErrorEvent has triggerAgentTurn === "after-current-turn". If the LLM consistently produces typechecking errors, outputs to stdout via console.log(), or fails to include <codemode> tags, the agent loop will recurse indefinitely, potentially causing stack overflow or resource exhaustion in the interactive chat UI.

Additional Locations (1)

Fix in Cursor Fix in Web

jonastemplestein and others added 14 commits December 6, 2025 21:17
Implements the core services needed for codemode functionality:

- codemode.model.ts: Event schemas and <codemode> block parsing
- codemode.repository.ts: Stores generated code in timestamped directories
- typechecker.service.ts: TypeScript compiler API wrapper for validation
- code-executor.service.ts: Bun subprocess execution with streaming output
- codemode.service.ts: Orchestrates parse/store/typecheck/execute workflow

Also adds error types, wires layers into main.ts, and updates vitest config
to include colocated tests in src/.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add --codemode / -x flag to enable code block processing
- Add handleCodemodeEvent for colored terminal output
- Refactor codemode.service to use Stream.unwrap for cleaner control flow
- Fix typecheck failure handling to properly stop execution
- Add E2E tests proving the full pipeline works

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add CodemodeResultEvent with toLLMMessage() returning user role
- Update PersistedEvent union to include CodemodeResultEvent
- Update eventsToPrompt to handle CodemodeResult as user message
- Integrate codemode execution into ContextService.addEvents
- CodemodeResult is now persisted and included in next LLM request
- Refactor CLI to use unified event handling

The codemode workflow now:
1. Assistant responds with <codemode> blocks
2. Code is parsed, typechecked, and executed
3. stdout/stderr captured as CodemodeResultEvent
4. Event persisted to context
5. Next LLM request includes it as user message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add getSecret tool to codemode tools interface for retrieving secrets
  hidden from LLM (implementation in code-executor.service.ts)
- Implement CodemodeResult return type with endTurn and data fields
- Add agent loop that re-calls LLM when endTurn=false
- Create CODEMODE_SYSTEM_PROMPT explaining tools and agent loop
- Context service swaps in codemode prompt when -x flag is used
- Add e2e tests for getSecret and CodemodeResult parsing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove codemodeOption from CLI
- Always pass codemode: true to ContextService.addEvents
- Update test to expect output contains "i" (via tools.log) rather than exact match

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The generated codemode files use `Promise<CodemodeResult>` in their
function signature but the import only included `Tools`. Added
`CodemodeResult` to the import statement.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Made system prompt more emphatic about requiring explicit type
  annotations (noImplicitAny is enabled)
- When typecheck fails, create CodemodeResultEvent with endTurn=false
  so the LLM can see the errors and retry with fixed code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The system prompt now explicitly states that Tools and CodemodeResult
are automatically available (imports are auto-prepended), preventing
LLMs from adding duplicate imports that cause TypeScript errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add triggerAgentTurn enum property ("after-current-turn" | "never") to all persisted events
- LLM triggered by any event with triggerAgentTurn="after-current-turn", not just UserMessage
- Remove endTurn and data fields from CodemodeResultEvent
- Add t.sendMessage() tool for user-facing output (stderr, no agent turn)
- Add t.fetch() tool for web requests
- console.log() triggers agent turn; sendMessage() doesn't
- Simplify system prompt: most tasks are single-turn
- All 75 tests pass

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Internalize Scope in CodeExecutor by using Stream.unwrapScoped
to manage subprocess lifecycle. This removes Scope from all
public interfaces making the stream easier to consume.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add `mini-agent codemode run <path>` CLI command for executing codemode blocks
- Refactor CodeExecutor to call CLI command instead of inline bun -e
- Move agent loop from ContextService to CLI layer for cleaner separation
- Increase max agent loop iterations from 3 to 15
- Add utility tools from kathmandu: calculate, now, sleep
- Add __CODEMODE_RESULT__ marker for cleaner stdout parsing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Dead code - appendLog was never called. Codeblock directories now only contain:
- index.ts (generated code)
- types.ts (tool type definitions)
- tsconfig.json (TypeScript config)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
These files were accidentally committed during development and
should not be tracked in the repository.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- chat-ui.ts now uses contextService.addEvents with codemode: true
  instead of calling streamLLMResponse directly
- Add CodemodeValidationErrorEvent to detect when LLM doesn't output
  codemode tags, triggering retry with chastising error message
- Add feed item renderers for CodemodeResult and CodemodeValidationError
  in opentui-chat.tsx
- Update llm.ts to include CodemodeValidationErrorEvent in prompt
  conversion
- Add agent continuation loop in chat-ui for codemode follow-ups

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
while (
lastCodemodeResult &&
lastCodemodeResult.triggerAgentTurn === "after-current-turn" &&
iteration < MAX_AGENT_LOOP_ITERATIONS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: CLI agent loop ignores validation error retry trigger

The agent loop in runEventStream only tracks lastCodemodeResult (CodemodeResultEvent) but ignores CodemodeValidationErrorEvent. When the LLM responds without codemode tags, a CodemodeValidationErrorEvent is emitted and persisted with triggerAgentTurn: "after-current-turn", but since lastCodemodeResult remains undefined, the while loop condition fails and the agent never retries. The chat-ui.ts correctly handles this via triggersContinuation checking both event types.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants