feat(init,rgai): auto-install grepai via init, delegate rtk rgai to grepai with silent fallback#136
feat(init,rgai): auto-install grepai via init, delegate rtk rgai to grepai with silent fallback#136heAdz0r wants to merge 14 commits intortk-ai:masterfrom
Conversation
Record project_path (cwd) in tracking database and add filtered query methods. `rtk gain -p` shows savings scoped to the current project directory instead of global aggregates. - tracking.rs: Add project_path column with auto-migration, index, and filtered variants for all query methods (summary, daily, weekly, monthly, recent) - gain.rs: Add resolve_project_scope(), shorten_path(), scope-aware header, pass project filter to all queries and exports - main.rs: Add --project/-p flag to Gain command Backward-compatible: existing rows get empty project_path, unfiltered queries delegate to filtered(None) which returns all data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address reviewer feedback on PR rtk-ai#128: 1. Replace SQL LIKE with GLOB in all project-scoped queries to prevent `_` and `%` characters in path names from being interpreted as wildcards (e.g., `my_project` matching `myXproject`). GLOB uses `*` for wildcard matching which is safer for file system paths. 2. Guard the startup `UPDATE commands SET project_path = ''` migration with an `EXISTS` check so it only runs when NULL rows actually exist, avoiding a no-op UPDATE on every startup after the first migration. 3. Add `DEFAULT ''` to the ALTER TABLE migration so new installs never create NULL project_path values. 4. Add 3 new unit tests for project_filter_params GLOB behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rust-native semantic search that scores files and lines by term relevance, symbol definitions, and path matching. No external dependencies (no grepai/embeddings required). Features: - Natural-language multi-word queries: rtk rgai "auth token refresh" - File scoring with symbol definition boost (+2.5) and comment penalty - Stop word removal + basic stemming for better recall - Compact and JSON output modes - File type filtering (--file-type ts/py/rust/etc.) - gitignore-aware traversal via `ignore` crate - Binary and large file skipping - Backward-compat: trailing path token auto-detection Includes 8 unit tests (5 in rgai_cmd, 3 for arg normalization).
…ment scoring
- stem_token: remove "es" suffix to fix broken stems for -ce/-ge/-ve words
(caches→cache, services→service, changes→change instead of cach/servic/chang)
- looks_like_path_token: remove bare contains('/') check that treated
"client/server" as a path; now requires actual path prefixes (./ ../ / ~/)
- is_comment_line: make '#' detection extension-aware to avoid penalizing
Markdown headers and YAML in non-script files; only applies to py/sh/rb/etc.
- Add tests for all three fixes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts: # src/gain.rs
# Conflicts: # src/init.rs
Search priority (mandatory): rgai > rg > grep. Hook changes: - Add rewrite rules: grepai/rgai search -> rtk rgai (Tier 1) - Split rg and grep into separate rules (Tier 2/3) - Source-of-truth comment for hook sync - Test infrastructure: HOOK env override, script-relative path Doc updates (README, INSTALL, TROUBLESHOOTING, awareness template): - Add search priority section - Update command tables with rtk rgai examples - Add search ladder (rgai -> grep -> proxy) - Remove unverifiable benchmark table Template updates (init.rs): - RTK_INSTRUCTIONS: add rtk rgai to Files & Search section - show_config: display search priority hint - Tests: assert rtk rgai in top-level commands list Test fixes: - Fix pre-existing find/tree/wget test expectations (hook already rewrites them on master, tests incorrectly expected no rewrite) - Add 7 new hook tests for rgai/grepai rewrite rules
Add comprehensive benchmark suite comparing grep, rtk grep, rtk rgai, and head_n (negative control) for code search tasks. Key methodology improvements: - Pinned commit verification (exit 2 if HEAD != gold_standards.json commit) - Dirty tree detection (exit 3 if uncommitted changes in src/) - Token-based TE using tiktoken (cl100k_base) instead of byte approximation - No output truncation (full quality samples preserved) - head_n negative control baseline for comparison - Auto-generated gold_auto.json from grep output for objective verification Benchmark categories: - A: Exact Identifier (6 queries) - rtk_grep recommended - B: Regex Pattern (6 queries) - grep/rtk_grep recommended - C: Semantic Intent (10 queries) - rtk_rgai recommended (100% vs 0% grep) - D: Cross-File Pattern (5 queries) - rtk_grep recommended - E: Edge Cases (3 queries) Key findings: - rtk rgai excels at semantic/intent queries (cosine similarity) - rtk grep provides best exact-match with token savings (~30%) - Recommended: rgai for discovery → grep fallback for precision
`rtk grep -n "pattern" path` failed with "unexpected argument '-n'" because clap didn't recognize -n before the positional arguments. Users naturally place -n before the pattern (muscle memory from grep/rg). The flag is a no-op since grep_cmd::run() already passes -n to ripgrep unconditionally, but clap must accept it. Adds -n/--line-numbers as an explicit bool field to the Grep command and ignores it in the match arm. Test added in grep_cmd::tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove .grepai/* and .claude/settings.local.json from tracking - Add .grepai/ and .claude/settings.local.json to .gitignore - Update ARCHITECTURE.md module map: 30 → 49 modules (adds cargo, curl, go, python, rgai, grepai, analytics modules) - Fix cargo fmt on gain.rs (long line split) Fixes validate CI check (module count mismatch). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep our module map (49 modules with curl_cmd, grepai, rgai_cmd, analytics) and discard duplicate PYTHON/GO sections from upstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution. The auto-install grepai + silent fallback concept is interesting. However, 145 changed files and 12K+ additions is not reviewable. This needs to be significantly reduced in scope. A few concerns:
Could you trim this down to just the core feature (init changes + delegation fallback)? |
|
Thanks for the review — you're right, the diff is way too large. Root causeThe branch accumulated commits from 5 other open PRs (#124, #125, #127, #128, #135) that were stacked on top of each other during development. The actual DependencyYes, this PR depends on #124 ( Action plan
What this PR actually adds (after cleanup)
I'll force-push the clean rebase as soon as #124 is merged. |
|
Closing — agreed with maintainers to keep grepai/rgai activity in my fork (heAdz0r/rtk) and not mix it into upstream for now. |
Summary
Adds optional grepai installation to
rtk init --globaland integratesrtk rgaidelegation to external grepai with silent fallback to built-in keyword search.User flow after this PR:
Then
rtk rgai "query"automatically delegates to grepai when available, or falls back to built-in search silently.Presets & Defaults
ollamagrepai init --provider ollamagob(file-based)grepai init --backend gobgrepai.enabledtrue~/.config/rtk/config.tomlgrepai.auto_inittruertk rgaigrepai.binary_pathNone(auto-detect)[Y/n])rtk init--builtinflagfalseWhy these defaults: Ollama runs locally (no API keys needed), gob is zero-config (no external DB), auto-init reduces friction. The
[Y/n]default (Yes) is intentionally different from the settings.json[y/N](No) — grepai is low-risk and high-value.Prerequisites for grepai
rtk rgaiworks without grepai (built-in keyword engine). For full embedding-based semantic search:ollama serve)rtk init)grepai init && grepai watch --backgroundIf any prerequisite is missing,
rtk rgaifalls back to built-in search silently (no warnings, no nagging).Security Compliance (SECURITY.md Workflow)
Layer 1: Automated security-check.yml
src/init.rs— triggers enhanced revieweval, noexec, no unsanitized interpolationCargo.tomlor CI workflow modificationsLayer 2: Installer security
The install step (
grepai::install_grepai) uses safe stdin piping instead ofcurl | sh:-fflag: fail on HTTP errors (no HTML error pages executed)sh -c "curl ... | sh"— avoids shell interpolation of URLINSTALL_DIRpassed as env var, not interpolated into command stringCommand::new()calls use absolute binary paths to avoid hook circular rewritingLayer 3: Manual review
init.rs(critical file) — requires maintainer sign-off per SECURITY.mdCommand::newinsrc/grepai.rs)Implementation Details
New file:
src/grepai.rsGrepaiStateenum:Ready/NotInitialized/NotInstalled~/.local/bin→/usr/local/bin(testable via DI)Modified:
src/config.rsGrepaiConfig { enabled, auto_init, binary_path }with serde defaultsModified:
src/init.rsrun_default_mode():setup_grepai(patch_mode, verbose)prompt_grepai_consent():[Y/n]default-Yes (non-interactive defaults to Yes)PatchModesemantics:Autoinstalls silently,Skipprints manual URL,AskpromptsModified:
src/rgai_cmd.rstry_grepai_delegation()before built-in engine--pathby running grepai in target project directoryresolve_grepai_project_dirModified:
src/main.rsmod grepai;declaration--builtinflag onCommands::RgaiFixed: ARCHITECTURE.md
validateCI check (module count mismatch)Fixed:
.gitignore.grepai/and.claude/settings.local.jsonDelegation Flow
Verification
Chain / Dependency Note
Part of the
rgaiadoption chain. Depends on / aligned with: #124, #125, #127.