Skip to content

supratikpm/gemini-autoresearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gemini-autoresearch banner
╔═══════════════════════════════════════════════════════════════╗
║                                                               ║
║          gemini-autoresearch                                  ║
║                                                               ║
║     Set a goal. Walk away. Wake up to results.                ║
║                                                               ║
║     Your AI works all night so you don't have to.            ║
║                                                               ║
╚═══════════════════════════════════════════════════════════════╝

Gemini CLI Skill Antigravity IDE License: MIT Hindi Bengali Chinese Japanese French Spanish Portuguese Russian

What is this? · 5-minute install · How it works · Commands · Examples · Why Gemini? · Contributing


What is this?

gemini-autoresearch turns Gemini CLI and Antigravity IDE into an autonomous improvement engine — for anything with a measurable outcome.

You describe what you want to improve. Gemini runs hundreds of small experiments overnight. It keeps every improvement and automatically reverses anything that makes things worse. You wake up to a log of exactly what happened.

Not a developer? That's fine. If you use Antigravity IDE to help with your work — writing, marketing, operations, content — this skill works for you too. See non-technical examples →


The big picture

Think of it like this. Normally when you use an AI assistant:

You describe a problem → AI gives you an answer → Done (one shot)

With gemini-autoresearch:

You set a goal → AI runs 100 experiments overnight → You keep every gain
       ↑                                                        |
       └────────────────── each run smarter than the last ──────┘

The difference is that it never stops at one answer. It iterates — trying things, measuring results, keeping what works, throwing out what doesn't — until you tell it to stop or it hits your goal.


How it works

Every iteration follows the same loop, running forever until you interrupt:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  1. READ        Review current state, git history, past lessons │
│  2. THINK       Form ONE specific hypothesis to test            │
│  3. CHANGE      Make ONE focused change                         │
│  4. SAVE        Git commit before testing (safe rollback point) │
│  5. MEASURE     Run Verify — did the metric improve?            │
│  6. PROTECT     Run Guard  — did anything else break?           │
│  7. DECIDE      Both pass → KEEP  /  Either fails → REVERT      │
│  8. LOG         Record what happened and why                    │
│  9. LEARN       Every 5 wins → write a lesson for future runs   │
│                                                                 │
│  ↑_________________________REPEAT FOREVER____________________↑  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Verify vs Guard — what is the difference?

This dual-gate system makes gemini-autoresearch safer than any other autoresearch skill:

Verify Guard
Question "Did I make progress?" "Did I break anything?"
Example Did test coverage go up? Do types still compile?
If it fails Revert immediately Rework (max 2 tries), then revert
Required? Yes Optional but strongly recommended

Why this matters: Without Guard, an AI chasing test coverage could silently introduce TypeScript errors across 50 iterations. With Guard, any change that breaks types gets reverted — even if it improved coverage. You wake up to clean working code, not just higher numbers.


5-minute install

For Gemini CLI

# 1. Clone this repo
git clone https://github.com/supratikpm/gemini-autoresearch.git

# 2a. Install globally — works in ALL your projects
cp -r gemini-autoresearch/skills/autoresearch ~/.gemini/skills/autoresearch

# 2b. Or install for this project only
cp -r gemini-autoresearch/skills/autoresearch .gemini/skills/autoresearch

# 3. Enable in Gemini CLI
# Open Gemini CLI → type /settings → enable Agent Skills

For Antigravity IDE

cp -r gemini-autoresearch/skills/autoresearch .agents/skills/autoresearch

Antigravity auto-discovers skills in .agents/skills/ — no settings change needed. Just describe your goal and it picks up the skill automatically.

Verify it is working

/skills

You should see autoresearch in the list.


Commands

/autoresearch — the main loop

/autoresearch
Goal:   <what you want to improve>
Scope:  <files or folders the AI may change>
Metric: <what to measure, and whether higher or lower is better>
Verify: <command that outputs the metric>
Guard:  <command that must always pass — optional>

/autoresearch:plan — do not know where to start?

Just describe your goal in plain English. Gemini scans your project, detects your tech stack, proposes the full config, runs a dry run, and hands you the ready-to-run command.

/autoresearch:plan make my app faster
/autoresearch:plan improve test coverage
/autoresearch:plan reduce my Docker image size

/autoresearch:ship — pre-flight before releasing

Runs a full checklist before you ship: tests, types, lint, bundle size, secrets scan, dependency audit. If anything fails, it runs an autoresearch loop to fix it automatically.

/autoresearch:ship
/autoresearch:ship --dry-run      ← report only, no fixes
/autoresearch:ship --fast         ← skip slow checks

/autoresearch:debug — something is broken

Autonomous root cause analysis. Reproduces the failure, isolates the cause, fixes it, verifies the fix holds.

/autoresearch:debug the auth tests are failing
/autoresearch:debug TypeError: Cannot read property 'id' of undefined
/autoresearch:debug CI is failing on the build step

/autoresearch:security — find vulnerabilities

Full STRIDE and OWASP security audit. Every finding requires code evidence. Optional auto-fix for confirmed critical and high findings.

/autoresearch:security
/autoresearch:security --fix                 ← auto-fix confirmed findings
/autoresearch:security --fail-on critical    ← CI gate mode

Overnight headless mode

gemini \
  --prompt "Start autoresearch. Goal: reduce bundle size below 200KB. Scope: src/. Metric: KB (lower is better). Verify: npm run build 2>&1 | grep 'First Load JS'. Do not pause." \
  --yolo

--yolo disables all confirmation prompts. --prompt starts immediately. Walk away. You will wake up to autoresearch-results.tsv and autoresearch-lessons.md.


Examples

For developers

Test coverage
/autoresearch
Goal:   Increase test coverage from 72% to 90%
Scope:  src/**/*.ts, src/**/*.test.ts
Metric: coverage % (higher is better)
Verify: npm test -- --coverage | grep "All files"
Guard:  npx tsc --noEmit
Bundle size
/autoresearch
Goal:   Reduce production bundle below 200KB
Scope:  src/**/*.tsx, src/**/*.ts
Metric: bundle size in KB (lower is better)
Verify: npm run build 2>&1 | grep "First Load JS"
Guard:  npm test
TypeScript errors
/autoresearch
Goal:   Eliminate all TypeScript errors
Scope:  src/**/*.ts
Metric: error count (lower is better)
Verify: npx tsc --noEmit 2>&1 | grep -c "error TS" || echo "0"
Lighthouse performance score
/autoresearch
Goal:   Lighthouse performance score above 95
Scope:  src/components/**/*.tsx, src/app/**/*.tsx
Metric: Lighthouse score (higher is better)
Verify: npx lighthouse http://localhost:3000 --output json --quiet 2>/dev/null | jq '.categories.performance.score * 100'
Guard:  npm test
Docker image size
/autoresearch
Goal:   Reduce Docker image below 150MB
Scope:  Dockerfile, .dockerignore
Metric: image size in MB (lower is better)
Verify: docker build -t bench . -q && docker images bench --format "{{.Size}}"
CI pipeline speed
/autoresearch
Goal:   CI pipeline under 5 minutes
Scope:  .github/workflows/*.yml
Metric: estimated pipeline duration in seconds (lower is better)
Verify: node scripts/estimate-ci-time.js
SQL query performance
/autoresearch
Goal:   Reduce dashboard query total execution time
Scope:  queries/dashboard/*.sql
Metric: total execution time in ms (lower is better)
Verify: psql -f scripts/bench-queries.sql | grep "total_ms"

For non-developers

You do not need to write code to use this. If your goal is measurable, Gemini can iterate toward it overnight. Here are examples using Antigravity IDE:

SEO — improve your blog post ranking (uses Google Search grounding — exclusive to Gemini)
/autoresearch
Goal:   Maximise SEO score for keyword "project management tips"
Scope:  content/blog/project-management.md
Metric: SEO score (higher is better)
Verify: node scripts/seo-score.js content/blog/project-management.md

What makes this Gemini-native: Gemini also searches Google to check what the top-ranking pages have that yours does not — and uses that as the next hypothesis. No other autoresearch skill can do this.

Marketing — optimise email subject lines
/autoresearch
Goal:   40 subject lines, each under 50 chars with a power word and CTA
Scope:  content/emails/subject-lines.md
Metric: lines meeting all criteria (higher is better)
Verify: node scripts/validate-subjects.js
HR — improve job description inclusivity
/autoresearch
Goal:   Bias-free, inclusive language across all job descriptions
Scope:  content/job-descriptions/*.md
Metric: inclusivity score (higher is better)
Verify: node scripts/jd-inclusivity-score.js
Operations — standardise SOPs
/autoresearch
Goal:   All SOPs use standard template with under 100 words per step
Scope:  docs/sops/*.md
Metric: template compliance % (higher is better)
Verify: node scripts/sop-score.js
Content — landing page quality score
/autoresearch
Goal:   Maximise CRO score — clear CTA, social proof, urgency, readability
Scope:  content/landing-pages/home.md
Metric: CRO score (higher is better)
Verify: node scripts/cro-score.js content/landing-pages/home.md

The lessons system — gets smarter every night

After every 5 successful iterations, Gemini writes a structured lesson:

## Lesson 3 — iterations 11–15
Pattern:       Deferring non-critical third-party scripts
Why it worked: Removes scripts from the critical render path
Conditions:    Analytics, chat widgets, social embeds
Anti-pattern:  Lazy-loading scripts called in the first 500ms caused layout shifts
Metric delta:  +6.8% Lighthouse performance

The next run reads these lessons before forming any hypothesis:

Night 1 → 100 experiments → lessons written
Night 2 → reads lessons  → avoids known failures → smarter from the start
Night 3 → reads updated  → compound gains

Each night builds on the last. No other autoresearch skill compounds learning across runs this way.


Why Gemini specifically?

Every major coding agent now has a generalised autoresearch skill:

Agent Skill
Claude Code uditgoenka/autoresearch
Codex CLI leo-lilinxiao/codex-autoresearch
Gemini CLI this repo

Gemini has three things the others cannot do:

Google Search grounding as a verification source

No other autoresearch skill can verify against live Google results natively. Gemini can query Google as part of each iteration — checking what is actually ranking, whether an API is deprecated, whether your approach matches current best practices. This makes SEO and content goals possible in a completely unique way.

True headless overnight mode

gemini --prompt "..." --yolo disables all confirmation prompts and starts immediately. No babysitting. You sleep; Gemini works.

1M token context window

The entire codebase, full git history, all reference docs, and the results log fit in a single context. No chunking, no summarisation, no lost context between iterations. Better hypotheses, fewer repeated mistakes.


Repo structure

gemini-autoresearch/
│
├── README.md                              ← you are here
├── CONTRIBUTING.md                        ← how to contribute
├── INSTALL.md                             ← detailed install guide
├── LICENSE                                ← MIT
│
├── scripts/
│   ├── verify.sh                          ← template for custom Verify commands
│   └── content-score.js                  ← Gemini-native LLM-as-judge scorer
│
└── skills/
    └── autoresearch/
        ├── SKILL.md                       ← main skill loaded by Gemini CLI
        └── references/
            ├── loop-protocol.md           ← 9-phase loop spec
            ├── google-search-patterns.md  ← Gemini-native verification patterns
            ├── lessons-system.md          ← cross-run memory spec
            ├── results-logging.md         ← TSV format and interpretation
            ├── plan-workflow.md           ← /autoresearch:plan spec
            ├── ship-workflow.md           ← /autoresearch:ship spec
            ├── debug-workflow.md          ← /autoresearch:debug spec
            └── security-workflow.md       ← /autoresearch:security spec

Cross-agent compatibility

Agent Install path
Gemini CLI (global) ~/.gemini/skills/autoresearch/
Gemini CLI (project) .gemini/skills/autoresearch/
Antigravity IDE .agents/skills/autoresearch/
Claude Code .claude/skills/autoresearch/
Cursor .cursor/skills/autoresearch/

Google Search grounding works only in Gemini CLI and Antigravity. Other agents skip those steps automatically.


Contributing

Contributions are welcome and appreciated. See CONTRIBUTING.md for the full guide.

Quick ways to help:

  • Add a new domain example or verification script
  • Translate the README into another language
  • Report a confusing instruction via Issues
  • Star the repo if this is useful to you ⭐

Credits


License

MIT — see LICENSE.


If this is useful, please consider starring the repo ⭐

Star on GitHub

About

Autonomous goal-directed iteration for Gemini CLI. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages