- Rust 1.70+ (Install)
- Git (for repository cloning)
- ~2GB disk space (for aggregated skills cache)
skills-bank aggregates skills (workflows, tasks, specialized agents) from 100+ distributed repositories and provides a unified routing system for AI agents to discover, load, and invoke them efficiently.
- Source-of-Truth Loading: Agents load canonical
SKILL.mdfiles directly from source repositories, not from catalogs. This eliminates hallucination risks and optimizes token usage. - Hybrid Classification: A dual-stage pipeline combines fast keyword rules (Step A) with LLM-powered semantic classification (Step B) to route skills into 12 domain hubs and 40+ sub-hubs.
- Smart Deduplication: Skills are deduplicated by name OR description — catching both exact collisions and cross-repo clones with different names but identical content.
- Multi-Tool Support: Skills sync to major AI tools including GitHub Copilot, Claude-code, free-code (claude-code), Hermes, Cursor, Gemini, Antigravity, OpenCode, Codex, and Windsurf.
- Token Efficiency: Load minimal metadata first, then source files on-demand—not batch-loading entire catalogs.
- Interactive TUI: A rich terminal UI (powered by Ratatui) provides real-time dashboard, skill explorer, and pipeline monitoring.
cd skills-bank/
cargo build --release# Interactive setup (first run)
cargo run --release
# Or run all steps in sequence
cargo run --release -- run
# Launch the interactive TUI
cargo run --release -- tuicargo run --release -- setupLaunches an interactive wizard to configure:
- Where skills should be synced (global, workspace, or both)
- Which AI tools to sync to
- Repository URLs to clone and aggregate
- Excluded categories
| Command | Purpose | When to Use |
|---|---|---|
aggregate |
Collect, deduplicate, classify, and route skills from configured repositories to skills-aggregated/ |
First run or when repositories change |
sync |
Distribute aggregated skills to configured AI tool directories | After aggregation completes |
run |
Execute the full pipeline (aggregate → sync) in sequence | Daily updates or automated workflows |
setup |
Configure sync targets, repositories, and exclusions interactively | Initial setup only |
add-repo <URL> |
Add a new skill repository to the configuration | When onboarding new sources |
doctor |
Validate installation and report repository state | Troubleshooting or pre-cleanup inspection |
release-gate |
Validate aggregation output integrity | Before releases or production sync |
cleanup-legacy-duplicates |
Remove legacy repository folders from src/ or repos/ (only if matching lib/ exists) |
Migration from older versions |
tui |
Launch interactive terminal dashboard with skill explorer and statistics | Real-time monitoring |
First-time setup:
cargo run --release -- setup
cargo run --release -- runDaily aggregation with monitoring:
cargo run --release -- aggregate # with progress bar
cargo run --release -- tui # monitor in backgroundValidate before production sync:
cargo run --release -- doctor
cargo run --release -- release-gate
cargo run --release -- sync- src/ — Rust source code: TUI, fetcher, aggregator, sync engine, classification logic
- Cargo.toml — Rust manifest (dependencies, metadata, build targets)
- .skills-bank-cli-config.json — User configuration file (generated by
setup, contains sync targets and repository URLs) - .env-example — Environment variable template
- skills-aggregated/ — Single source of truth containing:
routing.csv— Skill-to-hub/sub-hub routing tablesubhub-index.json— Hub and sub-hub registryhub-manifests.csv— Master index of all skills.skill-lock.json— Aggregation metadata and timestamps- Per-hub directories with
skills-manifest.jsonfiles
- lib/ — Canonical cache for cloned skill repositories (populated by
aggregatecommand)
- tests/ — Integration test suite for pipeline and TUI
- archive/ — Legacy PowerShell scripts (original PoC phase)
- package.json — Node.js manifest for
npxdistribution - readme.md — This file
Cache Location: lib/ (not src/) — This is the canonical directory for all cloned repositories.
Clone Strategy:
- First clone: Shallow clone with
git clone --depth 1 --single-branch --no-tags(faster, smaller disk footprint) - Subsequent runs:
git pullin existing directories (avoid re-cloning) - Deduplication: Normalized remote URLs and repository names prevent duplicate clones
Speed Optimization:
- Parallel cloning via configurable
PARALLEL_JOBS - Shallow clones reduce disk I/O by ~80% vs. full clones
- Incremental updates via
git pull
If you have repositories in older locations (src/ or repos/), migrate them:
# Inspect current state
cargo run --release -- doctor
# Remove legacy folders (safe: only deletes if matching lib/ exists and Git remote matches)
cargo run --release -- cleanup-legacy-duplicatesdoctor first to inspect repository state.
Generated during aggregation into skills-aggregated/:
| File | Purpose |
|---|---|
routing.csv |
Skill-to-hub/sub-hub mappings (name, hub, sub-hub, src_path) |
subhub-index.json |
Complete hub and sub-hub registry |
hub-manifests.csv |
Master index of all skills across all hubs |
.skill-lock.json |
Aggregation metadata (timestamps, repo revisions, dedup stats) |
[hub]/[sub-hub]/skills-manifest.json |
Per-sub-hub skill metadata and LLM classification triggers |
These files are used by agents and the TUI for discovery and routing.
Copy .env-example to .env to override defaults:
cp .env-example .envCommon variables:
SKILLS_BANK_CONFIG— Path to CLI config file (default:.skills-bank-cli-config.json)SKILLS_BANK_CACHE— Cache directory for repositories (default:lib/)SKILLS_BANK_OUTPUT— Output directory for aggregated skills (default:skills-aggregated/)LLM_BATCH_SIZE— Batch size for LLM classification (default:50)PARALLEL_JOBS— Number of parallel aggregation workers (default: auto-detect CPU count)
See .env-example for all available options.
Sync skills to any of these destinations:
| Tool | Project | Global |
|---|---|---|
| Claude | .claude/skills/ |
~/.claude/skills/ |
| free-code (claude-code) | .free-code-config/skills/ |
~/.free-code-config/skills/ |
| Hermes | .hermes/skills/ |
~/.hermes/skills/ |
| Code (Codex) | .agents/skills/ |
~/.agents/skills/ |
| GitHub Copilot | .github/skills/ |
~/.copilot/skills/ |
| Cursor | .cursor/skills/ |
~/.cursor/skills/ |
| Gemini | .gemini/skills/ |
~/.gemini/skills/ |
| Antigravity | .agent/skills/ |
~/.gemini/antigravity/skills/ |
| OpenCode | .opencode/skills/ |
~/.config/opencode/skills/ |
| Windsurf | .windsurf/skills/ |
~/.codeium/windsurf/skills/ |
The aggregation pipeline processes 8000+ SKILL.md files through a multi-stage classification system:
SKILL.md files (8000+)
│
▼
┌──────────────┐
│ YAML Parse │ Extract name, description, triggers
└──────┬───────┘
│
▼
┌──────────────┐
│ Keyword │ Fast token-based routing to hub/sub-hub
│ Rules │ (fallback if LLM unavailable)
└──────┬───────┘
│
▼
┌──────────────┐
│ Dedup │ Name OR Description HashSet
│ (two-key) │ Catches cross-repo clones
└──────┬───────┘
│
▼
┌──────────────────────────────────┐
│ Hybrid Exclusion + LLM Classify │
│ Step A: Keyword pre-filter │
│ Step B: LLM semantic classify │
│ (can return "excluded") │
└──────┬───────────────────────────┘
│
▼
┌──────────────┐
│ Output │ routing.csv, per-hub manifests,
│ Artifacts │ skills-index.json
└──────────────┘
The keyword-based classification system includes three critical enhancements to eliminate false negatives and resolve sub-hub conflicts:
Problem: Repository names like mukul975-anthropic-cybersecurity-skills were not being matched because the system used exact token matching (e.g., only matching the token "security", not the full repo name).
Solution: Introduced infer_hub_from_repo_name() function that:
- Extracts the repository directory name from the path (the segment right after
lib/orsrc/) - Uses substring matching to catch domain signals (e.g.,
"cybersecurity-skills"→ matches"security") - Runs before other inference logic (highest priority)
- Supports domain keywords:
- Security:
security,cybersecurity,pentest,vulnerability,vibesec,bluebook - AI:
prompt,agent-skill,llm,ai-skills - Mobile (iOS):
swiftui,ios-,-ios,swift-patterns,apple-hig,app-store - Mobile (Android):
android,kotlin - Frontend/UI:
ui-ux,ui-skills - Testing/QA:
playwright,testdino
- Security:
Confidence Score: 98% (near-deterministic, reflects author intent)
Problem: When a skill matched multiple sub-hubs (e.g., python AND security simultaneously), language hubs often won due to their anchor keywords, defeating domain-specialist classification.
Solution: Introduced conflict resolution table (CONFLICT_RESOLUTION) that:
- Defines precedence rules when multiple sub-hubs match:
(losing_hub, losing_sub_hub, winning_hub, winning_sub_hub) - Ensures domain specialists always win over languages:
security>python|javascript|typescript|rust|golang|javatesting-qa>python|javascript|typescript|rustcode-review>python|javascript
- Applied in
resolve_conflict()function when multiple candidates score within 5 points of the top score - Fallback: hub priority ordering if no explicit rule applies
Problem: Repository name signals (inferred from path) were scored 95%, allowing lower-confidence LLM results (80%) to potentially override them.
Solution: Raised the confidence score for path-based inference from 95 → 98%
- Score 98 is now treated as near-deterministic (same tier as explicit
canonicalize_assignmentlogic at 100) - Only scores ≥ 100 can override it
- Prevents low-confidence LLM results from contradicting repository metadata
For a skill in lib/mukul975-anthropic-cybersecurity-skills/:
1. apply_rules() called
↓
2. canonicalize_assignment() → no match (0% confidence)
↓
3. infer_from_path() called
├─ infer_hub_from_repo_name() extracts "mukul975-anthropic-cybersecurity-skills"
├─ Finds substring match: "cybersecurity"
└─ Returns ("code-quality", "security") with 98% confidence
↓
4. ✓ Final assignment: code-quality / security
✗ LLM classification skipped (98% > 80% threshold)
Check repository state:
cargo run --release -- doctorThis validates all repositories, checks Git remotes, and reports cache status.
Increase parallelism:
export PARALLEL_JOBS=16
cargo run --release -- aggregateCause: Existing junctions in sync target directories.
Solution: The sync command automatically skips existing junctions. If conflicts persist:
# Inspect sync targets
dir ~/.claude/skills # Windows
ls ~/.claude/skills # macOS/Linux
# Remove conflicting junctions/symlinks manually
rmdir /s ~/.claude/skills\[hub-name] # Windows
rm -rf ~/.claude/skills/[hub-name] # macOS/Linux
# Retry sync
cargo run --release -- syncCheck TUI progress:
cargo run --release -- tuiThe TUI shows real-time LLM batch progress. If stuck for >5 minutes:
# Check if LLM service (Ollama/Claude) is running
# Restart aggregation with keyword-only fallback
cargo run --release -- aggregate --skip-llmCheck output integrity:
cargo run --release -- release-gateThis validates:
- All
SKILL.mdfiles were processed - No orphaned or missing references in
routing.csv - Deduplication stats match cache state
If failures reported, re-run aggregation:
rm -rf skills-aggregated/
cargo run --release -- aggregate| Operation | Time | Dependencies |
|---|---|---|
| First aggregate (100+ repos, 8000+ skills) | 10-20 min | Network speed, CPU count, LLM latency |
| Incremental aggregate (repos already cached) | 2-5 min | LLM classification speed (can skip with --skip-llm) |
| Sync to tools (10 tools, all hubs) | 30-60 sec | Disk I/O, junction creation speed |
| TUI startup | <1 sec | Manifest parsing |
| LLM classification (8000 skills) | 3-8 min | Batch size, LLM throughput |
Optimization Tips:
- Use
PARALLEL_JOBS=autofor optimal CPU utilization - Set
LLM_BATCH_SIZE=100for faster LLM processing (requires more GPU/API quota) - Run on an SSD for 2-3x faster repository cloning
- Use shallow clones (default) to reduce disk bandwidth
# Clone and build
git clone <this-repo>
cd skills-bank
cargo build
# Run tests
cargo test
# Format code
cargo fmt
# Check for issues
cargo clippyWhen reporting bugs, include:
- Output of
cargo run --release -- doctor - Contents of
.skills-bank-cli-config.json(redact sensitive URLs if needed) - Error message and stack trace (if any)
- Steps to reproduce
To add new domain keywords or refine sub-hub routing:
- Edit
src/classify.rs→CONFLICT_RESOLUTIONtable or keyword rules - Add test cases in
tests/ - Run
cargo testandcargo run --release -- aggregate - Submit PR with classification examples
MIT — See package.json for details.