Security Considerations

This document describes the security model of AgentEval.jl and its implications.

Transport Security

AgentEval uses STDIO transport exclusively. This is a deliberate design choice for security:

Why STDIO is More Secure Than HTTP/TCP

Risk	HTTP/TCP Servers	STDIO (AgentEval)
Network access	Exposed on port	Not applicable
Remote attacks	Possible	Not possible
Port scanning	Discoverable	Not discoverable
Multi-user access	Other users can connect	Process-isolated
Firewall bypass	May be needed	Not needed

How STDIO Works

The MCP client (Claude Code) spawns AgentEval as a subprocess
Communication happens via stdin/stdout pipes
No network sockets are opened
The Julia process inherits the user's permissions
When the Claude session ends, the Julia process terminates

This architecture means:

No network attack surface - There's no port to scan or exploit
No authentication needed - Only the parent process can communicate
Automatic cleanup - Process dies when session ends

Code Execution

AgentEval evaluates arbitrary Julia code. This is by design - it's the core functionality for AI agent workflows.

What AgentEval Protects Against

TTFX overhead - Solved by persistent session
Process isolation - Each Claude session gets its own Julia process
Output capture - Stdout/stderr are captured and returned, not leaked

What AgentEval Does NOT Protect Against

Risk	Status	Mitigation
Malicious code execution	Not protected	Review AI-generated code
File system access	Full access	Use sandboxed environment
Network access from Julia	Full access	Use network policies
Resource exhaustion	No limits	Monitor resource usage
Environment variable access	Full access	Sanitize environment

The AI Trust Model

AgentEval trusts the MCP client to send reasonable code. In practice:

Claude Code decides what code to run - AgentEval executes it
The user reviews AI suggestions - Claude shows code before running
Permissions flow from user - AgentEval runs with user's permissions

This is similar to running julia -e "..." manually - the code runs with your permissions.

Comparison with Alternatives

MCPRepl.jl Security

MCPRepl.jl uses HTTP transport on port 3000:

Risk: Any process on the machine can connect
Risk: Network attacks possible if port is exposed
Risk: Could not be registered in Julia General due to security concerns

The kahliburke fork adds security features:

API key authentication
IP allowlisting
Security modes (strict/relaxed/lax)

However, it still opens a network port.

AgentEval Security Model

AgentEval avoids these issues by not opening any network port:

MCPRepl.jl:
  [Internet] → [Firewall] → [Port 3000] → [Julia REPL]
                              ↑
                   Attack surface exists

AgentEval:
  [Claude Code] ⟷ [stdin/stdout] ⟷ [Julia]
                        ↑
              No network attack surface

Best Practices

For Development Use

AgentEval is designed for local development workflows:

# This is the intended use case
claude mcp add julia-eval -- julia --project=/path/to/AgentEval.jl ...

For Production/Shared Environments

If you need to run AgentEval in a shared or production environment:

Use containers - Run Julia in a Docker container with limited permissions
Use seccomp/AppArmor - Restrict system calls available to Julia
Monitor execution - Log all code executed via AgentEval
Set resource limits - Use ulimit or cgroups to limit CPU/memory
Sanitize environment - Remove sensitive environment variables

Things to Avoid

Don't expose AgentEval to the network - It's designed for local use
Don't run with elevated permissions - Use a regular user account
Don't store secrets in environment variables - Julia can read them
Don't rely on AgentEval for sandboxing - It executes arbitrary code

Vulnerability Reporting

If you discover a security vulnerability in AgentEval, please:

Do not open a public issue
Contact the maintainers privately
Provide details about the vulnerability
Allow time for a fix before public disclosure

Disclaimer

This software is provided "AS IS" without warranties. It executes arbitrary code with user permissions. Use at your own risk.

See LICENSE for the full Apache 2.0 license terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

SECURITY.md

Security Considerations

Transport Security

Why STDIO is More Secure Than HTTP/TCP

How STDIO Works

Code Execution

What AgentEval Protects Against

What AgentEval Does NOT Protect Against

The AI Trust Model

Comparison with Alternatives

MCPRepl.jl Security

AgentEval Security Model

Best Practices

For Development Use

For Production/Shared Environments

Things to Avoid

Vulnerability Reporting

Disclaimer

There aren’t any published security advisories

Security: samtalki/AgentREPL.jl

Security

SECURITY.md

Security Considerations

Transport Security

Why STDIO is More Secure Than HTTP/TCP

How STDIO Works

Code Execution

What AgentEval Protects Against

What AgentEval Does NOT Protect Against

The AI Trust Model

Comparison with Alternatives

MCPRepl.jl Security

AgentEval Security Model

Best Practices

For Development Use

For Production/Shared Environments

Things to Avoid

Vulnerability Reporting

Disclaimer

There aren’t any published security advisories