Skip to content

Comments

feat(init,rgai): auto-install grepai via init, delegate rtk rgai to grepai with silent fallback#136

Closed
heAdz0r wants to merge 14 commits intortk-ai:masterfrom
heAdz0r:codex/grepai-init-review-fixes
Closed

feat(init,rgai): auto-install grepai via init, delegate rtk rgai to grepai with silent fallback#136
heAdz0r wants to merge 14 commits intortk-ai:masterfrom
heAdz0r:codex/grepai-init-review-fixes

Conversation

@heAdz0r
Copy link
Contributor

@heAdz0r heAdz0r commented Feb 15, 2026

Summary

Adds optional grepai installation to rtk init --global and integrates rtk rgai delegation to external grepai with silent fallback to built-in keyword search.

User flow after this PR:

$ rtk init --global
RTK hook installed (global).
  ...
Install grepai for semantic code search? [Y/n] y
  Installing grepai...
  grepai installed: ~/.local/bin/grepai
  Run `grepai init` in any project, then `grepai watch --background`.

Then rtk rgai "query" automatically delegates to grepai when available, or falls back to built-in search silently.

Presets & Defaults

Setting Default Where
Embedding provider ollama grepai init --provider ollama
Index backend gob (file-based) grepai init --backend gob
grepai.enabled true ~/.config/rtk/config.toml
grepai.auto_init true Auto-init project on first rtk rgai
grepai.binary_path None (auto-detect) Override binary location
Install prompt default Yes ([Y/n]) Consent during rtk init
--builtin flag false Force built-in, skip grepai

Why these defaults: Ollama runs locally (no API keys needed), gob is zero-config (no external DB), auto-init reduces friction. The [Y/n] default (Yes) is intentionally different from the settings.json [y/N] (No) — grepai is low-risk and high-value.

Prerequisites for grepai

rtk rgai works without grepai (built-in keyword engine). For full embedding-based semantic search:

  • Ollama running locally (ollama serve)
  • grepai installed (this PR automates it via rtk init)
  • Project indexed: grepai init && grepai watch --background

If any prerequisite is missing, rtk rgai falls back to built-in search silently (no warnings, no nagging).

Security Compliance (SECURITY.md Workflow)

Layer 1: Automated security-check.yml

  • Critical-file match: src/init.rs — triggers enhanced review
  • Dangerous-pattern scan: no eval, no exec, no unsanitized interpolation
  • No Cargo.toml or CI workflow modifications

Layer 2: Installer security

The install step (grepai::install_grepai) uses safe stdin piping instead of curl | sh:

// Step 1: Download script to memory
let script = Command::new("curl").args(["-fsSL", URL]).output()?;
// Step 2: Pipe to sh via stdin (no shell interpolation)
let mut installer = Command::new("sh")
    .env("INSTALL_DIR", &install_dir)
    .stdin(Stdio::piped()).spawn()?;
installer.stdin.write_all(&script.stdout)?;
  • -f flag: fail on HTTP errors (no HTML error pages executed)
  • No sh -c "curl ... | sh" — avoids shell interpolation of URL
  • INSTALL_DIR passed as env var, not interpolated into command string
  • All Command::new() calls use absolute binary paths to avoid hook circular rewriting

Layer 3: Manual review

  • PR touches init.rs (critical file) — requires maintainer sign-off per SECURITY.md
  • All subprocess invocations are auditable (grep for Command::new in src/grepai.rs)

Implementation Details

New file: src/grepai.rs

  • GrepaiState enum: Ready / NotInitialized / NotInstalled
  • Binary discovery: PATH → ~/.local/bin/usr/local/bin (testable via DI)
  • Install, init, search — all use explicit binary paths
  • 7 unit tests (state detection, fallback chain, priority)

Modified: src/config.rs

  • GrepaiConfig { enabled, auto_init, binary_path } with serde defaults
  • 1 unit test

Modified: src/init.rs

  • Step 6 in run_default_mode(): setup_grepai(patch_mode, verbose)
  • prompt_grepai_consent(): [Y/n] default-Yes (non-interactive defaults to Yes)
  • Follows existing PatchMode semantics: Auto installs silently, Skip prints manual URL, Ask prompts

Modified: src/rgai_cmd.rs

  • try_grepai_delegation() before built-in engine
  • Respects --path by running grepai in target project directory
  • Auto-init when grepai installed but project not initialized
  • Silent fallback on any error (verbose mode shows diagnostics)
  • 2 unit tests for resolve_grepai_project_dir

Modified: src/main.rs

  • mod grepai; declaration
  • --builtin flag on Commands::Rgai

Fixed: ARCHITECTURE.md

  • Module map updated: 30 → 49 modules (adds all missing modules from Python, Go, analytics, search categories)
  • Fixes validate CI check (module count mismatch)

Fixed: .gitignore

  • Added .grepai/ and .claude/settings.local.json
  • Removed accidentally committed local artifacts from tracking

Delegation Flow

rtk rgai "query"
  │
  ├─ --builtin? ──→ built-in keyword search
  │
  ├─ config.grepai.enabled == false? ──→ built-in
  │
  ├─ detect_grepai(project_path)
  │   ├─ NotInstalled ──→ built-in (silent)
  │   ├─ NotInitialized + auto_init ──→ grepai init → search
  │   └─ Ready ──→ grepai search
  │
  └─ grepai error? ──→ built-in (silent fallback)

Verification

# 1. All tests pass
cargo fmt --all --check && cargo clippy --all-targets && cargo test
# 369 passed

# 2. rtk init offers grepai install
cargo run -- init --global

# 3. Built-in fallback works
cargo run -- rgai --builtin "token tracking"

# 4. Delegation works (if grepai+ollama available)
cargo run -- rgai "token tracking"

# 5. Graceful fallback when grepai unavailable
# → Falls back to built-in without error

Chain / Dependency Note

Part of the rgai adoption chain. Depends on / aligned with: #124, #125, #127.

heAdz0r and others added 13 commits February 14, 2026 13:52
Record project_path (cwd) in tracking database and add filtered query
methods. `rtk gain -p` shows savings scoped to the current project
directory instead of global aggregates.

- tracking.rs: Add project_path column with auto-migration, index,
  and filtered variants for all query methods (summary, daily, weekly,
  monthly, recent)
- gain.rs: Add resolve_project_scope(), shorten_path(), scope-aware
  header, pass project filter to all queries and exports
- main.rs: Add --project/-p flag to Gain command

Backward-compatible: existing rows get empty project_path, unfiltered
queries delegate to filtered(None) which returns all data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address reviewer feedback on PR rtk-ai#128:

1. Replace SQL LIKE with GLOB in all project-scoped queries to prevent
   `_` and `%` characters in path names from being interpreted as
   wildcards (e.g., `my_project` matching `myXproject`). GLOB uses `*`
   for wildcard matching which is safer for file system paths.

2. Guard the startup `UPDATE commands SET project_path = ''` migration
   with an `EXISTS` check so it only runs when NULL rows actually exist,
   avoiding a no-op UPDATE on every startup after the first migration.

3. Add `DEFAULT ''` to the ALTER TABLE migration so new installs never
   create NULL project_path values.

4. Add 3 new unit tests for project_filter_params GLOB behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rust-native semantic search that scores files and lines by term
relevance, symbol definitions, and path matching. No external
dependencies (no grepai/embeddings required).

Features:
- Natural-language multi-word queries: rtk rgai "auth token refresh"
- File scoring with symbol definition boost (+2.5) and comment penalty
- Stop word removal + basic stemming for better recall
- Compact and JSON output modes
- File type filtering (--file-type ts/py/rust/etc.)
- gitignore-aware traversal via `ignore` crate
- Binary and large file skipping
- Backward-compat: trailing path token auto-detection

Includes 8 unit tests (5 in rgai_cmd, 3 for arg normalization).
…ment scoring

- stem_token: remove "es" suffix to fix broken stems for -ce/-ge/-ve words
  (caches→cache, services→service, changes→change instead of cach/servic/chang)
- looks_like_path_token: remove bare contains('/') check that treated
  "client/server" as a path; now requires actual path prefixes (./  ../  /  ~/)
- is_comment_line: make '#' detection extension-aware to avoid penalizing
  Markdown headers and YAML in non-script files; only applies to py/sh/rb/etc.
- Add tests for all three fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Search priority (mandatory): rgai > rg > grep.

Hook changes:
- Add rewrite rules: grepai/rgai search -> rtk rgai (Tier 1)
- Split rg and grep into separate rules (Tier 2/3)
- Source-of-truth comment for hook sync
- Test infrastructure: HOOK env override, script-relative path

Doc updates (README, INSTALL, TROUBLESHOOTING, awareness template):
- Add search priority section
- Update command tables with rtk rgai examples
- Add search ladder (rgai -> grep -> proxy)
- Remove unverifiable benchmark table

Template updates (init.rs):
- RTK_INSTRUCTIONS: add rtk rgai to Files & Search section
- show_config: display search priority hint
- Tests: assert rtk rgai in top-level commands list

Test fixes:
- Fix pre-existing find/tree/wget test expectations (hook already
  rewrites them on master, tests incorrectly expected no rewrite)
- Add 7 new hook tests for rgai/grepai rewrite rules
Add comprehensive benchmark suite comparing grep, rtk grep, rtk rgai,
and head_n (negative control) for code search tasks.

Key methodology improvements:
- Pinned commit verification (exit 2 if HEAD != gold_standards.json commit)
- Dirty tree detection (exit 3 if uncommitted changes in src/)
- Token-based TE using tiktoken (cl100k_base) instead of byte approximation
- No output truncation (full quality samples preserved)
- head_n negative control baseline for comparison
- Auto-generated gold_auto.json from grep output for objective verification

Benchmark categories:
- A: Exact Identifier (6 queries) - rtk_grep recommended
- B: Regex Pattern (6 queries) - grep/rtk_grep recommended
- C: Semantic Intent (10 queries) - rtk_rgai recommended (100% vs 0% grep)
- D: Cross-File Pattern (5 queries) - rtk_grep recommended
- E: Edge Cases (3 queries)

Key findings:
- rtk rgai excels at semantic/intent queries (cosine similarity)
- rtk grep provides best exact-match with token savings (~30%)
- Recommended: rgai for discovery → grep fallback for precision
`rtk grep -n "pattern" path` failed with "unexpected argument '-n'"
because clap didn't recognize -n before the positional arguments.

Users naturally place -n before the pattern (muscle memory from
grep/rg). The flag is a no-op since grep_cmd::run() already passes
-n to ripgrep unconditionally, but clap must accept it.

Adds -n/--line-numbers as an explicit bool field to the Grep command
and ignores it in the match arm. Test added in grep_cmd::tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove .grepai/* and .claude/settings.local.json from tracking
- Add .grepai/ and .claude/settings.local.json to .gitignore
- Update ARCHITECTURE.md module map: 30 → 49 modules
  (adds cargo, curl, go, python, rgai, grepai, analytics modules)
- Fix cargo fmt on gain.rs (long line split)

Fixes validate CI check (module count mismatch).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@heAdz0r heAdz0r changed the title feat(init,rgai): optional grepai install and delegated semantic search fallback feat(init,rgai): auto-install grepai via init, delegate rtk rgai to grepai with silent fallback Feb 15, 2026
Keep our module map (49 modules with curl_cmd, grepai, rgai_cmd,
analytics) and discard duplicate PYTHON/GO sections from upstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FlorianBruniaux
Copy link
Collaborator

Thanks for the contribution. The auto-install grepai + silent fallback concept is interesting.

However, 145 changed files and 12K+ additions is not reviewable. This needs to be significantly reduced in scope.

A few concerns:

  1. Scope: The PR description mentions init changes + rgai delegation, but 145 files suggests much more is bundled in
  2. Dependencies on other PRs: This seems to depend on feat: add rtk rgai command for semantic code search #124 (rtk rgai command) — is that correct? If so, feat: add rtk rgai command for semantic code search #124 should be merged first
  3. Size: Please rebase on latest master, remove any unrelated changes, and keep this focused strictly on the init grepai install + delegation logic

Could you trim this down to just the core feature (init changes + delegation fallback)?

@heAdz0r
Copy link
Contributor Author

heAdz0r commented Feb 16, 2026

Thanks for the review — you're right, the diff is way too large.

Root cause

The branch accumulated commits from 5 other open PRs (#124, #125, #127, #128, #135) that were stacked on top of each other during development. The actual init + rgai delegation changes are only 2 commits / ~8 files / ~680 additions.

Dependency

Yes, this PR depends on #124 (rtk rgai command). The rgai_cmd.rs module and grepai.rs delegation logic require the base rgai command to be present.

Action plan

  1. Please review and merge feat: add rtk rgai command for semantic code search #124 first — it's the foundation (3 files, ~1K additions, CI passing)
  2. Also mergeable independently: fix(grep): accept -n flag for grep/rg compatibility #135 (grep -n fix, 2 files, 22 adds) and feat(gain): add per-project token savings with -p flag #128 (gain per-project, 3 files, 275 adds)
  3. After feat: add rtk rgai command for semantic code search #124 merges, I'll rebase this PR on fresh master with only the init+delegation commits → bringing it down to ~8 files / ~680 additions
  4. feat(docs,hooks): enforce rgai-first search policy #125 and feat(benchmark): reproducible code-search methodology with rgai/grep strategy #127 will also be rebased independently after their dependencies are resolved

What this PR actually adds (after cleanup)

File Change
src/init.rs grepai auto-install step in rtk init --global
src/grepai.rs grepai binary detection + delegation logic
src/rgai_cmd.rs rgai → grepai safe fallback
src/config.rs grepai config support
src/main.rs command registration
.gitignore exclude .grepai/ artifacts
ARCHITECTURE.md updated module map

I'll force-push the clean rebase as soon as #124 is merged.

@heAdz0r
Copy link
Contributor Author

heAdz0r commented Feb 17, 2026

Closing — agreed with maintainers to keep grepai/rgai activity in my fork (heAdz0r/rtk) and not mix it into upstream for now.

@heAdz0r heAdz0r closed this Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants