╔═══════════════════════════════════════════════════════════════╗
║ ║
║ gemini-autoresearch ║
║ ║
║ Set a goal. Walk away. Wake up to results. ║
║ ║
║ Your AI works all night so you don't have to. ║
║ ║
╚═══════════════════════════════════════════════════════════════╝
What is this? · 5-minute install · How it works · Commands · Examples · Why Gemini? · Contributing
gemini-autoresearch turns Gemini CLI and Antigravity IDE into an autonomous improvement engine — for anything with a measurable outcome.
You describe what you want to improve. Gemini runs hundreds of small experiments overnight. It keeps every improvement and automatically reverses anything that makes things worse. You wake up to a log of exactly what happened.
Not a developer? That's fine. If you use Antigravity IDE to help with your work — writing, marketing, operations, content — this skill works for you too. See non-technical examples →
Think of it like this. Normally when you use an AI assistant:
You describe a problem → AI gives you an answer → Done (one shot)
With gemini-autoresearch:
You set a goal → AI runs 100 experiments overnight → You keep every gain
↑ |
└────────────────── each run smarter than the last ──────┘
The difference is that it never stops at one answer. It iterates — trying things, measuring results, keeping what works, throwing out what doesn't — until you tell it to stop or it hits your goal.
Every iteration follows the same loop, running forever until you interrupt:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ 1. READ Review current state, git history, past lessons │
│ 2. THINK Form ONE specific hypothesis to test │
│ 3. CHANGE Make ONE focused change │
│ 4. SAVE Git commit before testing (safe rollback point) │
│ 5. MEASURE Run Verify — did the metric improve? │
│ 6. PROTECT Run Guard — did anything else break? │
│ 7. DECIDE Both pass → KEEP / Either fails → REVERT │
│ 8. LOG Record what happened and why │
│ 9. LEARN Every 5 wins → write a lesson for future runs │
│ │
│ ↑_________________________REPEAT FOREVER____________________↑ │
│ │
└─────────────────────────────────────────────────────────────────┘
This dual-gate system makes gemini-autoresearch safer than any other autoresearch skill:
| Verify | Guard | |
|---|---|---|
| Question | "Did I make progress?" | "Did I break anything?" |
| Example | Did test coverage go up? | Do types still compile? |
| If it fails | Revert immediately | Rework (max 2 tries), then revert |
| Required? | Yes | Optional but strongly recommended |
Why this matters: Without Guard, an AI chasing test coverage could silently introduce TypeScript errors across 50 iterations. With Guard, any change that breaks types gets reverted — even if it improved coverage. You wake up to clean working code, not just higher numbers.
# 1. Clone this repo
git clone https://github.com/supratikpm/gemini-autoresearch.git
# 2a. Install globally — works in ALL your projects
cp -r gemini-autoresearch/skills/autoresearch ~/.gemini/skills/autoresearch
# 2b. Or install for this project only
cp -r gemini-autoresearch/skills/autoresearch .gemini/skills/autoresearch
# 3. Enable in Gemini CLI
# Open Gemini CLI → type /settings → enable Agent Skillscp -r gemini-autoresearch/skills/autoresearch .agents/skills/autoresearchAntigravity auto-discovers skills in .agents/skills/ — no settings change needed. Just describe your goal and it picks up the skill automatically.
/skills
You should see autoresearch in the list.
/autoresearch
Goal: <what you want to improve>
Scope: <files or folders the AI may change>
Metric: <what to measure, and whether higher or lower is better>
Verify: <command that outputs the metric>
Guard: <command that must always pass — optional>
Just describe your goal in plain English. Gemini scans your project, detects your tech stack, proposes the full config, runs a dry run, and hands you the ready-to-run command.
/autoresearch:plan make my app faster
/autoresearch:plan improve test coverage
/autoresearch:plan reduce my Docker image size
Runs a full checklist before you ship: tests, types, lint, bundle size, secrets scan, dependency audit. If anything fails, it runs an autoresearch loop to fix it automatically.
/autoresearch:ship
/autoresearch:ship --dry-run ← report only, no fixes
/autoresearch:ship --fast ← skip slow checks
Autonomous root cause analysis. Reproduces the failure, isolates the cause, fixes it, verifies the fix holds.
/autoresearch:debug the auth tests are failing
/autoresearch:debug TypeError: Cannot read property 'id' of undefined
/autoresearch:debug CI is failing on the build step
Full STRIDE and OWASP security audit. Every finding requires code evidence. Optional auto-fix for confirmed critical and high findings.
/autoresearch:security
/autoresearch:security --fix ← auto-fix confirmed findings
/autoresearch:security --fail-on critical ← CI gate mode
gemini \
--prompt "Start autoresearch. Goal: reduce bundle size below 200KB. Scope: src/. Metric: KB (lower is better). Verify: npm run build 2>&1 | grep 'First Load JS'. Do not pause." \
--yolo--yolo disables all confirmation prompts. --prompt starts immediately. Walk away. You will wake up to autoresearch-results.tsv and autoresearch-lessons.md.
Test coverage
/autoresearch
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.ts, src/**/*.test.ts
Metric: coverage % (higher is better)
Verify: npm test -- --coverage | grep "All files"
Guard: npx tsc --noEmit
Bundle size
/autoresearch
Goal: Reduce production bundle below 200KB
Scope: src/**/*.tsx, src/**/*.ts
Metric: bundle size in KB (lower is better)
Verify: npm run build 2>&1 | grep "First Load JS"
Guard: npm test
TypeScript errors
/autoresearch
Goal: Eliminate all TypeScript errors
Scope: src/**/*.ts
Metric: error count (lower is better)
Verify: npx tsc --noEmit 2>&1 | grep -c "error TS" || echo "0"
Lighthouse performance score
/autoresearch
Goal: Lighthouse performance score above 95
Scope: src/components/**/*.tsx, src/app/**/*.tsx
Metric: Lighthouse score (higher is better)
Verify: npx lighthouse http://localhost:3000 --output json --quiet 2>/dev/null | jq '.categories.performance.score * 100'
Guard: npm test
Docker image size
/autoresearch
Goal: Reduce Docker image below 150MB
Scope: Dockerfile, .dockerignore
Metric: image size in MB (lower is better)
Verify: docker build -t bench . -q && docker images bench --format "{{.Size}}"
CI pipeline speed
/autoresearch
Goal: CI pipeline under 5 minutes
Scope: .github/workflows/*.yml
Metric: estimated pipeline duration in seconds (lower is better)
Verify: node scripts/estimate-ci-time.js
SQL query performance
/autoresearch
Goal: Reduce dashboard query total execution time
Scope: queries/dashboard/*.sql
Metric: total execution time in ms (lower is better)
Verify: psql -f scripts/bench-queries.sql | grep "total_ms"
You do not need to write code to use this. If your goal is measurable, Gemini can iterate toward it overnight. Here are examples using Antigravity IDE:
SEO — improve your blog post ranking (uses Google Search grounding — exclusive to Gemini)
/autoresearch
Goal: Maximise SEO score for keyword "project management tips"
Scope: content/blog/project-management.md
Metric: SEO score (higher is better)
Verify: node scripts/seo-score.js content/blog/project-management.md
What makes this Gemini-native: Gemini also searches Google to check what the top-ranking pages have that yours does not — and uses that as the next hypothesis. No other autoresearch skill can do this.
Marketing — optimise email subject lines
/autoresearch
Goal: 40 subject lines, each under 50 chars with a power word and CTA
Scope: content/emails/subject-lines.md
Metric: lines meeting all criteria (higher is better)
Verify: node scripts/validate-subjects.js
HR — improve job description inclusivity
/autoresearch
Goal: Bias-free, inclusive language across all job descriptions
Scope: content/job-descriptions/*.md
Metric: inclusivity score (higher is better)
Verify: node scripts/jd-inclusivity-score.js
Operations — standardise SOPs
/autoresearch
Goal: All SOPs use standard template with under 100 words per step
Scope: docs/sops/*.md
Metric: template compliance % (higher is better)
Verify: node scripts/sop-score.js
Content — landing page quality score
/autoresearch
Goal: Maximise CRO score — clear CTA, social proof, urgency, readability
Scope: content/landing-pages/home.md
Metric: CRO score (higher is better)
Verify: node scripts/cro-score.js content/landing-pages/home.md
After every 5 successful iterations, Gemini writes a structured lesson:
## Lesson 3 — iterations 11–15
Pattern: Deferring non-critical third-party scripts
Why it worked: Removes scripts from the critical render path
Conditions: Analytics, chat widgets, social embeds
Anti-pattern: Lazy-loading scripts called in the first 500ms caused layout shifts
Metric delta: +6.8% Lighthouse performance
The next run reads these lessons before forming any hypothesis:
Night 1 → 100 experiments → lessons written
Night 2 → reads lessons → avoids known failures → smarter from the start
Night 3 → reads updated → compound gains
Each night builds on the last. No other autoresearch skill compounds learning across runs this way.
Every major coding agent now has a generalised autoresearch skill:
| Agent | Skill |
|---|---|
| Claude Code | uditgoenka/autoresearch |
| Codex CLI | leo-lilinxiao/codex-autoresearch |
| Gemini CLI | this repo |
Gemini has three things the others cannot do:
No other autoresearch skill can verify against live Google results natively. Gemini can query Google as part of each iteration — checking what is actually ranking, whether an API is deprecated, whether your approach matches current best practices. This makes SEO and content goals possible in a completely unique way.
gemini --prompt "..." --yolo disables all confirmation prompts and starts immediately. No babysitting. You sleep; Gemini works.
The entire codebase, full git history, all reference docs, and the results log fit in a single context. No chunking, no summarisation, no lost context between iterations. Better hypotheses, fewer repeated mistakes.
gemini-autoresearch/
│
├── README.md ← you are here
├── CONTRIBUTING.md ← how to contribute
├── INSTALL.md ← detailed install guide
├── LICENSE ← MIT
│
├── scripts/
│ ├── verify.sh ← template for custom Verify commands
│ └── content-score.js ← Gemini-native LLM-as-judge scorer
│
└── skills/
└── autoresearch/
├── SKILL.md ← main skill loaded by Gemini CLI
└── references/
├── loop-protocol.md ← 9-phase loop spec
├── google-search-patterns.md ← Gemini-native verification patterns
├── lessons-system.md ← cross-run memory spec
├── results-logging.md ← TSV format and interpretation
├── plan-workflow.md ← /autoresearch:plan spec
├── ship-workflow.md ← /autoresearch:ship spec
├── debug-workflow.md ← /autoresearch:debug spec
└── security-workflow.md ← /autoresearch:security spec
| Agent | Install path |
|---|---|
| Gemini CLI (global) | ~/.gemini/skills/autoresearch/ |
| Gemini CLI (project) | .gemini/skills/autoresearch/ |
| Antigravity IDE | .agents/skills/autoresearch/ |
| Claude Code | .claude/skills/autoresearch/ |
| Cursor | .cursor/skills/autoresearch/ |
Google Search grounding works only in Gemini CLI and Antigravity. Other agents skip those steps automatically.
Contributions are welcome and appreciated. See CONTRIBUTING.md for the full guide.
Quick ways to help:
- Add a new domain example or verification script
- Translate the README into another language
- Report a confusing instruction via Issues
- Star the repo if this is useful to you ⭐
- Andrej Karpathy — for autoresearch and the insight that constraint + metric + overnight iteration = compounding gains
- Udit Goenka — for generalising it to Claude Code and proving it works beyond ML
- Leo Lilinxiao — for the Codex version and the dual-gate Verify + Guard pattern
- Google — for Gemini CLI and the skills system
MIT — see LICENSE.
If this is useful, please consider starring the repo ⭐
