Skip to content

lwyBZss8924d/ReCodeAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ReCodeAgent - Universal Recursive Code Generation Agent

Core Paper: ReCode: Unify Plan and Action for Universal Granularity Control (arXiv:2510.23564v2)

Project Status: Production-ready Rust implementation with Harbor integration for Terminal-Bench 2.0 evaluation Updated Date: 2025-11-22


Project Objective: Productionize the ReCode research paradigm into a high-performance Rust Core + Codex CLI integrated recursive code generation system. Now featuring Harbor Container-Unified Architecture for Terminal-Bench 2.0 benchmark evaluation.

arXiv License Rust


πŸ“– Project Overview

ReCodeAgent is a production implementation of the academic paper ReCode: Unify Plan and Action for Universal Granularity Control, using a Hybrid Architecture: Rust Orchestrator + Codex CLI Executor to enable dynamic granularity control from fixed-granularity decision-making to universal programming agents.

Core Value Propositions

  • πŸ”„ Recursive Code Generation: Placeholder functions auto-expand into executable code
  • ⚑ High-Performance Rust Core: DFS tree traversal, AST parsing, checkpoint mechanism
  • πŸ”Œ Codex CLI Integration: LLM calls via codex exec --json, authenticated with ~/.codex/auth.json
  • 🐳 Harbor Container-Unified Architecture: Seamless Terminal-Bench 2.0 evaluation
  • πŸ› οΈ Tool Ecosystem Integration: File editing, command execution, environment interaction
  • πŸ”’ Production-Grade Reliability: Type safety, memory efficiency, JSONL event streaming

πŸš€ Quick Start

Running with Harbor (Recommended)

# Navigate to Harbor workspace
cd ~/harbor-workspace

# Run a single task
harbor run -d terminal-bench@2.0 -t regex-log -a recode-agent

# Specify prompt template
harbor run -d terminal-bench@2.0 -t regex-log -a recode-agent \
  --agent-kwarg template=recode_tb2_prompt.jinja2

# Limit max steps + debug mode
harbor run -d terminal-bench@2.0 -t password-recovery -a recode-agent \
  --agent-kwarg max_steps=50 --debug

# Batch run all tasks (4 concurrent)
harbor run -d terminal-bench@2.0 -a recode-agent -n 4

Local Development Testing

# Quick smoke test (10 steps)
cargo run --example terminal_bench_smoke --release

# CLI subcommand test
cargo run --release --manifest-path recode-core/Cargo.toml -- \
  execute --task-name test --instruction "test task" --working-dir /tmp --max-steps 5

View Results

# Latest task results
ls -lt ~/harbor-workspace/jobs/ | head -3

# Execution logs
cat ~/harbor-workspace/jobs/<job-id>/<task-id>/agent/command-2/stdout.txt | tail -50

# Verification result (0.0 = fail, 1.0 = success)
cat ~/harbor-workspace/jobs/<job-id>/<task-id>/verifier/reward.txt

πŸ—οΈ Architecture

Harbor Container-Unified Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   macOS Development Environment (Host)          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  ReCodeAgent Repository                                         β”‚
β”‚  ~/dev-space/ReCodeAgent/                                       β”‚
β”‚  β”œβ”€β”€ recode-core/src/           # Rust source code              β”‚
β”‚  └── recode-core/templates/     # Jinja2 Prompt templates       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Harbor Installation                                            β”‚
β”‚  ~/.local/share/uv/tools/harbor/.../agents/installed/           β”‚
β”‚  β”œβ”€β”€ recode_agent.py            # Harbor Agent definition       β”‚
β”‚  └── recode-assets/             # Deployment assets             β”‚
β”‚      β”œβ”€β”€ recode-agent           # Linux x86_64 binary           β”‚
β”‚      β”œβ”€β”€ templates/             # Jinja2 templates              β”‚
β”‚      └── scripts/               # Python bridge scripts         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ Docker Volume
                              β”‚ ${HOME}/.codex:/tmp/host-codex:ro
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             Docker Container (Terminal-Bench 2.0)               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  /app/                        # Working directory               β”‚
β”‚  β”œβ”€β”€ recode-agent             # ReCodeAgent binary              β”‚
β”‚  β”œβ”€β”€ AGENTS.md                # Rendered system prompt          β”‚
β”‚  β”‚                            # (auto-loaded by Codex)          β”‚
β”‚  β”œβ”€β”€ instruction.md           # Task instruction                β”‚
β”‚  └── .codex/                  # Codex CLI config                β”‚
β”‚      β”œβ”€β”€ auth.json            # Auth (from host)                β”‚
β”‚      └── config.toml          # Model configuration             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Execution Flow

harbor run -d terminal-bench@2.0 -t <task> -a recode-agent
    β”‚
    β”œβ”€β”€β–Ά Step 1: Setup Codex auth (copy auth.json, config.toml)
    β”œβ”€β”€β–Ά Step 2: Render AGENTS.md (recode-agent render-template)
    β”œβ”€β”€β–Ά Step 3: Execute task (recode-agent execute + codex exec)
    β”‚             └── DFS tree traversal + checkpoint self-verification
    └──▢ Step 4: Cleanup

CLI Subcommand Structure

enum Command {
    /// Legacy: Run with explicit bridge configuration
    Run { env_kind, python, bridge, bridge_args, instruction },

    /// Render AGENTS.md template (Harbor Step 2)
    RenderTemplate { template, output, task_name, instruction_path },

    /// Execute task (Harbor Step 3) - Core execution engine
    Execute { task_name, instruction, working_dir, max_steps, codex_home },
}

πŸ“ Project Structure

recode-core/
β”œβ”€β”€ Cargo.toml
β”œβ”€β”€ Dockerfile                 # Container image config
β”‚
β”œβ”€β”€ examples/
β”‚   └── terminal_bench_smoke.rs  # Quick local test (10 steps)
β”‚
β”œβ”€β”€ harbor-assets/             # Harbor deployment assets
β”‚   β”œβ”€β”€ recode-agent           # Linux x86_64 binary
β”‚   └── templates/             # Deployment templates
β”‚
β”œβ”€β”€ templates/                 # Jinja2 Prompt templates (source)
β”‚   β”œβ”€β”€ recode_tb2_agents_md.jinja2      # Default AGENTS.md template
β”‚   β”œβ”€β”€ recode_tb2_prompt.jinja2         # TB2 task prompt
β”‚   β”œβ”€β”€ recode_microtexecute_tb2_prompt.jinja2  # Codex expansion
β”‚   └── recode_tb2_checkpoint_minimal.jinja2    # Checkpoint
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.rs                # CLI entry (run, render-template, execute)
β”‚   β”œβ”€β”€ codex/
β”‚   β”‚   └── thread_manager.rs  # Codex CLI integration
β”‚   β”œβ”€β”€ execution/
β”‚   β”‚   └── python_executor.rs # Python code execution
β”‚   β”œβ”€β”€ orchestrator/
β”‚   β”‚   β”œβ”€β”€ engine.rs          # Codex prompt assembly
β”‚   β”‚   └── runtime.rs         # DFS tree + checkpoint mechanism
β”‚   └── tree/
β”‚       β”œβ”€β”€ context.rs
β”‚       └── node.rs
β”‚
└── tests/
    └── fixtures/

πŸ—οΈ ReCode Core Methodology

Decision Process Formulation

We model LLM-based agent interaction with the environment as a simplified decision process:

$$\mathcal{M} = \langle \mathcal{S}, \mathcal{A}, \mathcal{O}, T, R \rangle$$

Where:

  • $\mathcal{S}$: State space
  • $\mathcal{A}$: Primitive action space (executable operations like run('crack egg'))
  • $\mathcal{O}$: Observation space
  • $T: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}$: Transition function
  • $R: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}$: Reward function

Beyond primitive actions, we introduce plan space $\mathcal{P}$ containing high-level intentions requiring decomposition (e.g., prepare_breakfast()).

Decision space: $\mathcal{D} = \mathcal{A} \cup \mathcal{P}$

Unified Plan & Action Representation

Key Insight: Plans and actions, though seemingly different, can be unified into a single executable code representation.

  • Actions (primitive): Executable operations like run('click the submit button')
  • Plans (abstract): Unimplemented placeholder functions like prepare_breakfast(), get_ingredients()

This unified representation enables seamless transitions between planning and execution.

Recursive Code Generation Algorithm

Algorithm 1: The ReCode Algorithm

Procedure ReCode(T, Ο€, E, c):
    if c is None:                      // Initialize
        o_0 ← Reset(E)                 // Reset environment
        c ← Text2Code(T, o_0)          // Convert task to root placeholder
    end if

    code_block ← Ο€(c)                  // LLM generates child code

    for each child u in code_block:
        if IsPrimitive(u):             // Primitive action
            Execute(u, E)
        else:                          // Placeholder function
            ReCode(T, Ο€, E, u)         // Recursive expansion
        end if
    end for
end procedure

Implementation Details

  • Task Initialization: Task instruction β†’ root placeholder function solve(instruction, observation)
  • Context Management: Unified variable namespace, persisted across recursion levels
  • Error Handling: Self-correction loop (max_rewrite=5)
  • Recursion Control: Maximum recursion depth 10
  • Checkpoint Mechanism: DFS tree completion β‰  task solved, inject checkpoint for agent self-verification

πŸ› οΈ Development

Prerequisites

  • Rust 1.83+ (curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)
  • Docker (for cross-compilation and Harbor)
  • Codex CLI (~/.codex/auth.json configured)
  • Harbor Framework (uv tool install harbor)

Build Commands

# Local macOS build (development)
cargo build --release --manifest-path recode-core/Cargo.toml

# Linux x86_64 build (Harbor container)
docker build --platform linux/amd64 -f Dockerfile.build-x86 -t recode-builder .
docker create --name tmp recode-builder
docker cp tmp:/build/recode-core/target/release/recode-core ./recode-agent-linux-x86_64
docker rm tmp

# Verify binary
file ./recode-agent-linux-x86_64
# Should output: ELF 64-bit LSB pie executable, x86-64...

Sync to Harbor

HARBOR_ASSETS=~/.local/share/uv/tools/harbor/lib/python3.13/site-packages/harbor/agents/installed/recode-assets

# Sync binary
cp ./recode-agent-linux-x86_64 $HARBOR_ASSETS/recode-agent
chmod +x $HARBOR_ASSETS/recode-agent

# Sync templates
cp -r recode-core/templates/*.jinja2 $HARBOR_ASSETS/templates/

# Sync scripts
cp scripts/terminal_bench_bridge.py $HARBOR_ASSETS/scripts/

Testing

cargo test                              # All tests
cargo test --test codex_turn_tests     # Codex integration tests
cargo clippy                            # Code linting

βš™οΈ Configuration

Prompt Templates

Template Purpose Default
recode_tb2_agents_md.jinja2 AGENTS.md system prompt βœ“
recode_tb2_prompt.jinja2 TB2 task prompt
recode_microtexecute_tb2_prompt.jinja2 Codex expansion
recode_tb2_checkpoint_minimal.jinja2 Checkpoint verification

Harbor Agent Parameters

Parameter Type Default Description
template string recode_tb2_agents_md.jinja2 Jinja2 template filename
max_steps int 99999 DFS tree maximum steps

Usage: --agent-kwarg template=xxx --agent-kwarg max_steps=100

Codex CLI Integration

  • Authentication: ~/.codex/auth.json (copied to container /app/.codex/)
  • Model config: ~/.codex/config.toml (default: gpt-5.1-codex-max)
  • AGENTS.md: Auto-discovered and loaded by Codex (95%+ token savings)
  • Command: codex exec --json for JSONL event streaming

πŸ“š Documentation

Document Description
WARP.md Quick reference guide for WARP/Claude Code
CLAUDE.md Claude Code instructions
Harbor ENV Guide Detailed Harbor operation manual
Architecture Technical architecture specification
Roadmap Implementation roadmap

πŸ“ˆ Benchmark Results

ReCodeAgent is evaluated on Terminal-Bench 2.0, a benchmark for terminal-based task automation.

# Run evaluation
harbor run -d terminal-bench@2.0 -a recode-agent -n 4

# View results
cat ~/harbor-workspace/jobs/<job-id>/result.json | jq '.stats'

πŸ”¬ Technology Stack

Core Dependencies (Rust)

  • tokio - Async runtime
  • clap - CLI argument parsing
  • serde / serde_json - Serialization / JSONL parsing
  • minijinja - Jinja2 template rendering
  • tree-sitter - High-performance AST parsing
  • tracing - Structured logging

External Dependencies

  • Codex CLI - LLM calls and tool execution
  • Harbor Framework - Benchmark evaluation platform
  • Docker - Container runtime

πŸ“ License

Apache License 2.0 - See LICENSE file for details.


πŸ™ Acknowledgments


Links


Last Updated: 2025-11-22 Version: v0.2.0 Status: βœ… Production-ready with Harbor Terminal-Bench 2.0 integration

About

ReCodeAgent project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors