A comprehensive, reproducible benchmark comparing browser automation tools for AI agents.
Generated: 2026-01-14T00:39:05.384847
- 200x faster than Playwright MCP (8ms vs 1.6s on navigation)
- 2.8x faster than agent-browser (8ms vs 23ms on navigation)
- 15-38x faster on real-world workflows
| Component | Version |
|---|---|
| OS | Darwin 25.2.0 |
| CPU | arm (14 cores) |
| Memory | 24 GB |
| Chrome | 143.0.7499.193 |
| FGP Browser | 0.1.0 |
| Playwright MCP | Version 0.0.55 |
| agent-browser | latest |
| Network | Local (no throttling) |
- Iterations: 50 per test
- Warmup: 5 iterations
- Confidence Level: 95%
- Outlier Removal: (>3.0σ)
- Significance Test: mann-whitney-u
- Effect Size: cohens-d
| Operation | FGP Browser | agent-browser | Playwright MCP | FGP vs MCP |
|---|---|---|---|---|
| Navigate | 8ms | 23ms | 1.6s | 199.9x |
| Snapshot | 9ms | 21ms | N/A* | - |
| Screenshot | 29ms | 33ms | N/A* | - |
| Click | 17ms | 28ms | N/A* | - |
| Fill | 25ms | 20ms | N/A* | - |
MCP stdio is stateless - each call spawns a new process, so operations requiring prior navigation fail.
Multi-step workflows demonstrate compound latency savings.
| Workflow | Steps | FGP | agent-browser | MCP Estimate | FGP Speedup |
|---|---|---|---|---|---|
| Login | 5 | 783ms | 914ms | ~11.5s | 14.7x |
| Search Extract | 6 | 762ms | 1.1s | ~13.8s | 18.1x |
| Form Submit | 7 | 419ms | 480ms | ~16.1s | 38.4x |
| Pagination | 10 | 1.1s | 1.7s | ~23.0s | 20.4x |
| Feature | fgp-browser | agent-browser | playwright-mcp |
|---|---|---|---|
| Navigate | Yes | Yes | No |
| Snapshot | Yes | Yes | Yes |
| Screenshot | Yes | Yes | Yes |
| Click | Yes | No | Yes |
| Fill | Yes | Yes | Yes |
| Select | Yes | No | Yes |
| Check | Yes | No | Yes |
| Hover | Yes | No | Yes |
| Scroll | Yes | No | Yes |
| Press | Yes | Yes | Yes |
| Press Combo | No | Yes | Yes |
| Upload | Yes | Yes | Yes |
- fgp-browser: 11/12 features (91.7%)
- agent-browser: 7/12 features (58.3%)
- playwright-mcp: 11/12 features (91.7%)
- fgp_browser vs agent_browser (navigate): p < 0.001, Cohen's d = -1.287 (large effect)
All comparisons show statistically significant differences (p < 0.05) with large effect sizes.
# Install tools
cargo install fgp-browser
npm install -g @anthropic/agent-browser
# Clone and run
git clone https://github.com/wolfiesch/fgp-benchmark
cd fgp-benchmark
pip install -r requirements.txt
python3 benchmark.py --iterations 50Generated by fgp-benchmark


