feat: add rtk rgai command for semantic code search#124
feat: add rtk rgai command for semantic code search#124heAdz0r wants to merge 2 commits intortk-ai:masterfrom
Conversation
Rust-native semantic search that scores files and lines by term relevance, symbol definitions, and path matching. No external dependencies (no grepai/embeddings required). Features: - Natural-language multi-word queries: rtk rgai "auth token refresh" - File scoring with symbol definition boost (+2.5) and comment penalty - Stop word removal + basic stemming for better recall - Compact and JSON output modes - File type filtering (--file-type ts/py/rust/etc.) - gitignore-aware traversal via `ignore` crate - Binary and large file skipping - Backward-compat: trailing path token auto-detection Includes 8 unit tests (5 in rgai_cmd, 3 for arg normalization).
|
|
||
| let suffixes = ["ingly", "edly", "ing", "ed", "es", "s"]; | ||
| for suffix in suffixes { | ||
| if token.len() > suffix.len() + 2 && token.ends_with(suffix) { |
There was a problem hiding this comment.
stem_token("caches") → "cach", stem_token("services") → "servic", stem_token("changes") → "chang". Any word ending in -ce, -ge, -se + s loses its final e. These broken stems won't match actual occurrences in code.
src/main.rs
Outdated
| || token == ".." | ||
| || token.starts_with("./") | ||
| || token.starts_with('/') | ||
| || token.contains('/') |
There was a problem hiding this comment.
token.contains('/') is too greedy. A query like rtk rgai "client/server architecture" will treat "client/server" as a path and silently drop it from the query.
src/rgai_cmd.rs
Outdated
|
|
||
| fn is_comment_line(line: &str) -> bool { | ||
| let trimmed = line.trim_start(); | ||
| trimmed.starts_with("//") |
There was a problem hiding this comment.
starts_with('#') penalizes Markdown headers and YAML keys on top of Python comments. Can skew scoring on non-code files.
…ment scoring
- stem_token: remove "es" suffix to fix broken stems for -ce/-ge/-ve words
(caches→cache, services→service, changes→change instead of cach/servic/chang)
- looks_like_path_token: remove bare contains('/') check that treated
"client/server" as a path; now requires actual path prefixes (./ ../ / ~/)
- is_comment_line: make '#' detection extension-aware to avoid penalizing
Markdown headers and YAML in non-script files; only applies to py/sh/rb/etc.
- Add tests for all three fixes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@pszymkowiak All three issues have been addressed in commit db322a6: 1. stem_token broken stems — Removed
2.
3.
Ready for re-review. All tests pass, CI checks are green. |
|
I'm having trouble understanding the rationale here. Why would rtk auto-install a third-party binary (grepai) via rtk init? This raises supply chain concerns — rtk shouldn't be As a reminder, rtk's scope is intentionally narrow: it's a lightweight CLI proxy that compresses command output to save LLM tokens. It wraps existing commands — it doesn't implement new Looking at the bigger picture, PRs #124, #125, #127, and #136 form a chain that progressively introduces grepai into rtk. Can you explain the relationship between you and the grepai |
|
Closing — agreed with maintainers to keep grepai/rgai activity in my fork (heAdz0r/rtk) and not mix it into upstream for now. |
Summary
Extracted from #118 per reviewer feedback. This is the actual feature implementation — the command that #118's docs/hooks referenced but didn't include.
rtk rgaiis a Rust-native semantic search that scores files by term relevance without requiring external embedding services.Usage
How it works
Token savings
Changes
src/rgai_cmd.rs: 789 lines — full search implementation with scoring, ranking, output formattingsrc/main.rs:Commands::Rgaivariant, match arm,normalize_rgai_args()for backward-compat path detectionTest plan
cargo test rgai— 8 tests pass (5 unit + 3 arg normalization)cargo test— 321 total tests passcargo fmt --all --check— cleancargo clippy --all-targets— no new warningsrtk rgai "token tracking" -p .returns ranked results from this repo