Purpose Convert a
CodeMessage(C++ 17 source) into aProfileReportthat contains precise runtime and memory telemetry across logarithmically‑spaced input scales, executed inside an isolated Docker sandbox with strict resource limits.Reader Senior Python/C++ developer; after reading, you can implement the agent end‑to‑end with zero follow‑up questions.
-
Prepare input cases based on plan‑derived
input_bounds(log‑scale + corner cases). -
Compile the C++ code with deterministic flags.
-
Execute the binary under
/usr/bin/time -vin a Docker container with cgroup &resource.setrlimitcaps. -
Parse the time & memory output into numeric lists aligned with
input_sizes. -
Collect (optional) hotspot information via
gprofwhendebug=true. -
Emit a strict
ProfileReportPydantic model.
class Profiler(Agent):
"""Empirical runtime/memory measurement for a single CodeMessage."""
def run(self, code: CodeMessage, *, debug: bool = False) -> ProfileReport:
"""Compile & execute, returning ProfileReport.
Raises:
SandboxError: on compilation or runtime error that persists after 1 retry.
"""
| Name | Default | Description |
| sandbox_timeout_sec | 2 | Hard wall clock per run (seconds) |
| sandbox_mem_mb | 512 | Resident‑set limit |
| input_scales | [1e3,5e3,1e4,5e4,1e5] | Logarithmic sizes |
| extra_corner_cases | [0,1] | Added at beginning of list |
Profiler.run()
├─ _prepare_inputs() # deterministic input strings
├─ _compile_cpp() # g++ -O2 -std=c++17
│ └─ returns path to binary
├─ for each n in input_sizes:
│ └─ _execute_binary() # wraps docker + /usr/bin/time -v
│ ├─ stdout
│ └─ _parse_time_output()
├─ if debug:
│ └─ _collect_gprof()
└─ Build ProfileReport → return
| Module/Func | Purpose | I/O & Exceptions |
| **_prepare_inputs** | Generate deterministic worst‑case inputs. | → List[str] same length as input_sizes |
| **_compile_cpp** | Compile source to main.out; retry once on failure. | → Path to binary; raises CompilationError |
| **_execute_binary** | Run binary inside isolated container with /usr/bin/time -v. | → (stdout:str, time_out:str); raises RuntimeError |
| **_parse_time_output** | Regex‑extract Elapsed \(wall clock\), Maximum resident set size. | → (runtime_ms:float, peak_mem_mb:float) |
| **_collect_gprof** | Run gprof -l binary → map line numbers to cumulative time %. | → Dict[str,str] hotspots |
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y build-essential gprof time
RUN useradd -m sandbox
USER sandbox
WORKDIR /sandbox
Image tag: ghcr.io/swiftsolve/sandbox:20.04-gcc10
#!/usr/bin/env bash
ulimit -v $(( ${SANDBOX_MEM_MB} * 1024 )) # RLIMIT_AS
ulimit -s 262144 # 256 MB stack
exec /usr/bin/time -v "$@"
Mount path /code for source + binary.
-
Container memory limit =
${SANDBOX_MEM_MB}m. -
--pids-limit 64to avoid fork bombs. -
No network (
--network none).
For an integer bound n_max extracted from PlanMessage.input_bounds:
sizes_log = [int(x) for x in [1e3, 5e3, 1e4, 5e4, 1e5] if x <= n_max]
input_sizes = [0,1] + sizes_log
If task type unknown, feed {n}\n where task likely reads n and exits. TODO: once datasets/task_format.py matured, switch to task‑specific input constructors provided by dataset metadata.
Regex patterns:
_RX_WALL = re.compile(r"Elapsed \(wall clock\) time.*?:\s*(\d+:\d+\.\d+)")
_RX_RSS = re.compile(r"Maximum resident set size \(kbytes\):\s*(\d+)")
Conversion:
mins,secs = wall.split(':'); runtime_ms = (int(mins)*60+float(secs))*1000
peak_mb = int(rss_kb)//1024
Edge cases: if regex missing → raise ParseError and mark run as failure.
| Stage | Retry Count | On Failure Action |
| Compilation | 1 | Re‑run _compile_cpp; if still fails → raise SandboxError(code="compile") |
| Execution | 0 (deterministic) | Capture stderr; if non‑zero exit or timeout → mark runtime ∞, memory ∞, continue loop; set hotspot {"_crash":"SIGSEGV"} |
| Parsing | 0 | Raise ParseError → controller counts as agent failure |
report = ProfileReport(
task_id=code.task_id,
iteration=code.iteration,
input_sizes=input_sizes,
runtime_ms=runtimes,
peak_memory_mb=memories,
hotspots=hotspots,
)
Validation: len(runtime_ms) == len(input_sizes) == len(peak_memory_mb).
-
Profilerlogger name:Profiler. -
INFO: compile command, each input size runtime/mem.
-
WARNING: compile/runtime failures.
-
DEBUG (if
LOG_LEVEL=DEBUG): full/usr/bin/time -vdump.
| Test ID | Description | Tooling |
| P‑1 | Successful compile & run on "hello world". | Pytest, stub |
| P‑2 | Time/mem regex parsing gives correct floats. | Regex unit |
| P‑3 | Exceed memory limit triggers SandboxError. | Docker mem=64m |
| P‑4 | Crash binary produces ∞ runtimes and warning. | Faulty code |
| P‑5 | debug=True attaches hotspot dict keys. | gprof mock |
Coverage ≥ 90 % for agents/profiler.py.
-
Given a syntactically correct
CodeMessage,Profiler.run()returns a validProfileReportwithin2 × len(input_sizes)seconds total wall time. -
Report numbers differ < 5 % across two consecutive runs on same host (determinism).
-
Memory is within ± 2 MB of
/usr/bin/timemanual run.
End of Profiler Agent Technical Plan