skills-bank

High-performance skill aggregation, classification & routing platform for AI agents.

� Prerequisites

Rust 1.70+ (Install)
Git (for repository cloning)
~2GB disk space (for aggregated skills cache)

�📖 Overview

skills-bank aggregates skills (workflows, tasks, specialized agents) from 100+ distributed repositories and provides a unified routing system for AI agents to discover, load, and invoke them efficiently.

Core Design Principles

Source-of-Truth Loading: Agents load canonical SKILL.md files directly from source repositories, not from catalogs. This eliminates hallucination risks and optimizes token usage.
Hybrid Classification: A dual-stage pipeline combines fast keyword rules (Step A) with LLM-powered semantic classification (Step B) to route skills into 12 domain hubs and 40+ sub-hubs.
Smart Deduplication: Skills are deduplicated by name OR description — catching both exact collisions and cross-repo clones with different names but identical content.
Multi-Tool Support: Skills sync to major AI tools including GitHub Copilot, Claude-code, free-code (claude-code), Hermes, Cursor, Gemini, Antigravity, OpenCode, Codex, and Windsurf.
Token Efficiency: Load minimal metadata first, then source files on-demand—not batch-loading entire catalogs.
Interactive TUI: A rich terminal UI (powered by Ratatui) provides real-time dashboard, skill explorer, and pipeline monitoring.

🚀 Quick Start

1. Build the CLI

cd skills-bank/
cargo build --release

2. Run the Full Pipeline

# Interactive setup (first run)
cargo run --release

# Or run all steps in sequence
cargo run --release -- run

# Launch the interactive TUI
cargo run --release -- tui

Interactive Setup (First Time)

cargo run --release -- setup

Launches an interactive wizard to configure:

Where skills should be synced (global, workspace, or both)
Which AI tools to sync to
Repository URLs to clone and aggregate
Excluded categories

🎮 Commands Reference

Core Pipeline Commands

Command	Purpose	When to Use
`aggregate`	Collect, deduplicate, classify, and route skills from configured repositories to `skills-aggregated/`	First run or when repositories change
`sync`	Distribute aggregated skills to configured AI tool directories	After aggregation completes
`run`	Execute the full pipeline (aggregate → sync) in sequence	Daily updates or automated workflows
`setup`	Configure sync targets, repositories, and exclusions interactively	Initial setup only
`add-repo <URL>`	Add a new skill repository to the configuration	When onboarding new sources
`doctor`	Validate installation and report repository state	Troubleshooting or pre-cleanup inspection
`release-gate`	Validate aggregation output integrity	Before releases or production sync
`cleanup-legacy-duplicates`	Remove legacy repository folders from `src/` or `repos/` (only if matching `lib/` exists)	Migration from older versions
`tui`	Launch interactive terminal dashboard with skill explorer and statistics	Real-time monitoring

Example Workflows

First-time setup:

cargo run --release -- setup
cargo run --release -- run

Daily aggregation with monitoring:

cargo run --release -- aggregate  # with progress bar
cargo run --release -- tui         # monitor in background

Validate before production sync:

cargo run --release -- doctor
cargo run --release -- release-gate
cargo run --release -- sync

📁 Project Structure

Source Code & Configuration

src/ — Rust source code: TUI, fetcher, aggregator, sync engine, classification logic
Cargo.toml — Rust manifest (dependencies, metadata, build targets)
.skills-bank-cli-config.json — User configuration file (generated by setup, contains sync targets and repository URLs)
.env-example — Environment variable template

Generated Outputs (After Aggregation)

skills-aggregated/ — Single source of truth containing:
- routing.csv — Skill-to-hub/sub-hub routing table
- subhub-index.json — Hub and sub-hub registry
- hub-manifests.csv — Master index of all skills
- .skill-lock.json — Aggregation metadata and timestamps
- Per-hub directories with skills-manifest.json files

Repository Cache

lib/ — Canonical cache for cloned skill repositories (populated by aggregate command)

Testing & Documentation

tests/ — Integration test suite for pipeline and TUI
archive/ — Legacy PowerShell scripts (original PoC phase)
package.json — Node.js manifest for npx distribution
readme.md — This file

📁 Repository Management

Cloning & Caching

Cache Location: lib/ (not src/) — This is the canonical directory for all cloned repositories.

Clone Strategy:

First clone: Shallow clone with git clone --depth 1 --single-branch --no-tags (faster, smaller disk footprint)
Subsequent runs: git pull in existing directories (avoid re-cloning)
Deduplication: Normalized remote URLs and repository names prevent duplicate clones

Speed Optimization:

Parallel cloning via configurable PARALLEL_JOBS
Shallow clones reduce disk I/O by ~80% vs. full clones
Incremental updates via git pull

Legacy Repository Cleanup

If you have repositories in older locations (src/ or repos/), migrate them:

# Inspect current state
cargo run --release -- doctor

# Remove legacy folders (safe: only deletes if matching lib/ exists and Git remote matches)
cargo run --release -- cleanup-legacy-duplicates

⚠️ Warning: This is destructive. Always run doctor first to inspect repository state.

⚙️ Output Files & Configuration

Generated during aggregation into skills-aggregated/:

File	Purpose
`routing.csv`	Skill-to-hub/sub-hub mappings (name, hub, sub-hub, src_path)
`subhub-index.json`	Complete hub and sub-hub registry
`hub-manifests.csv`	Master index of all skills across all hubs
`.skill-lock.json`	Aggregation metadata (timestamps, repo revisions, dedup stats)
`[hub]/[sub-hub]/skills-manifest.json`	Per-sub-hub skill metadata and LLM classification triggers

These files are used by agents and the TUI for discovery and routing.

🌐 Environment Variables

Copy .env-example to .env to override defaults:

cp .env-example .env

Common variables:

SKILLS_BANK_CONFIG — Path to CLI config file (default: .skills-bank-cli-config.json)
SKILLS_BANK_CACHE — Cache directory for repositories (default: lib/)
SKILLS_BANK_OUTPUT — Output directory for aggregated skills (default: skills-aggregated/)
LLM_BATCH_SIZE — Batch size for LLM classification (default: 50)
PARALLEL_JOBS — Number of parallel aggregation workers (default: auto-detect CPU count)

See .env-example for all available options.

🎯 Tool Integration Targets

Sync skills to any of these destinations:

Tool	Project	Global
Claude	`.claude/skills/`	`~/.claude/skills/`
free-code (claude-code)	`.free-code-config/skills/`	`~/.free-code-config/skills/`
Hermes	`.hermes/skills/`	`~/.hermes/skills/`
Code (Codex)	`.agents/skills/`	`~/.agents/skills/`
GitHub Copilot	`.github/skills/`	`~/.copilot/skills/`
Cursor	`.cursor/skills/`	`~/.cursor/skills/`
Gemini	`.gemini/skills/`	`~/.gemini/skills/`
Antigravity	`.agent/skills/`	`~/.gemini/antigravity/skills/`
OpenCode	`.opencode/skills/`	`~/.config/opencode/skills/`
Windsurf	`.windsurf/skills/`	`~/.codeium/windsurf/skills/`

🏗️ Classification Architecture

The aggregation pipeline processes 8000+ SKILL.md files through a multi-stage classification system:

 SKILL.md files (8000+)
        │
        ▼
 ┌──────────────┐
 │  YAML Parse   │  Extract name, description, triggers
 └──────┬───────┘
        │
        ▼
 ┌──────────────┐
 │  Keyword      │  Fast token-based routing to hub/sub-hub
 │  Rules        │  (fallback if LLM unavailable)
 └──────┬───────┘
        │
        ▼
 ┌──────────────┐
 │  Dedup        │  Name OR Description HashSet
 │  (two-key)    │  Catches cross-repo clones
 └──────┬───────┘
        │
        ▼
 ┌──────────────────────────────────┐
 │  Hybrid Exclusion + LLM Classify │
 │  Step A: Keyword pre-filter      │
 │  Step B: LLM semantic classify   │
 │         (can return "excluded")  │
 └──────┬───────────────────────────┘
        │
        ▼
 ┌──────────────┐
 │  Output       │  routing.csv, per-hub manifests,
 │  Artifacts    │  skills-index.json
 └──────────────┘

🔍 Classification Improvements (v2.0+)

The keyword-based classification system includes three critical enhancements to eliminate false negatives and resolve sub-hub conflicts:

1. Repository Name Extraction (Substring Matching)

Problem: Repository names like mukul975-anthropic-cybersecurity-skills were not being matched because the system used exact token matching (e.g., only matching the token "security", not the full repo name).

Solution: Introduced infer_hub_from_repo_name() function that:

Extracts the repository directory name from the path (the segment right after lib/ or src/)
Uses substring matching to catch domain signals (e.g., "cybersecurity-skills" → matches "security")
Runs before other inference logic (highest priority)
Supports domain keywords:
- Security: security, cybersecurity, pentest, vulnerability, vibesec, bluebook
- AI: prompt, agent-skill, llm, ai-skills
- Mobile (iOS): swiftui, ios-, -ios, swift-patterns, apple-hig, app-store
- Mobile (Android): android, kotlin
- Frontend/UI: ui-ux, ui-skills
- Testing/QA: playwright, testdino

Confidence Score: 98% (near-deterministic, reflects author intent)

2. Sub-Hub Conflict Resolution

Problem: When a skill matched multiple sub-hubs (e.g., python AND security simultaneously), language hubs often won due to their anchor keywords, defeating domain-specialist classification.

Solution: Introduced conflict resolution table (CONFLICT_RESOLUTION) that:

Defines precedence rules when multiple sub-hubs match: (losing_hub, losing_sub_hub, winning_hub, winning_sub_hub)
Ensures domain specialists always win over languages:
- security > python | javascript | typescript | rust | golang | java
- testing-qa > python | javascript | typescript | rust
- code-review > python | javascript
Applied in resolve_conflict() function when multiple candidates score within 5 points of the top score
Fallback: hub priority ordering if no explicit rule applies

3. Confidence Boost for Path-Based Inference

Problem: Repository name signals (inferred from path) were scored 95%, allowing lower-confidence LLM results (80%) to potentially override them.

Solution: Raised the confidence score for path-based inference from 95 → 98%

Score 98 is now treated as near-deterministic (same tier as explicit canonicalize_assignment logic at 100)
Only scores ≥ 100 can override it
Prevents low-confidence LLM results from contradicting repository metadata

📊 Example Classification Flow

For a skill in lib/mukul975-anthropic-cybersecurity-skills/:

1. apply_rules() called
   ↓
2. canonicalize_assignment() → no match (0% confidence)
   ↓
3. infer_from_path() called
   ├─ infer_hub_from_repo_name() extracts "mukul975-anthropic-cybersecurity-skills"
   ├─ Finds substring match: "cybersecurity"
   └─ Returns ("code-quality", "security") with 98% confidence
   ↓
4. ✓ Final assignment: code-quality / security
   ✗ LLM classification skipped (98% > 80% threshold)

🔧 Troubleshooting

Issue: Skills not aggregating or taking too long

Check repository state:

cargo run --release -- doctor

This validates all repositories, checks Git remotes, and reports cache status.

Increase parallelism:

export PARALLEL_JOBS=16
cargo run --release -- aggregate

Issue: Sync failing with "junction or symlink" errors

Cause: Existing junctions in sync target directories.

Solution: The sync command automatically skips existing junctions. If conflicts persist:

# Inspect sync targets
dir ~/.claude/skills  # Windows
ls ~/.claude/skills   # macOS/Linux

# Remove conflicting junctions/symlinks manually
rmdir /s ~/.claude/skills\[hub-name]  # Windows
rm -rf ~/.claude/skills/[hub-name]    # macOS/Linux

# Retry sync
cargo run --release -- sync

Issue: LLM classification appears stuck

Check TUI progress:

cargo run --release -- tui

The TUI shows real-time LLM batch progress. If stuck for >5 minutes:

# Check if LLM service (Ollama/Claude) is running
# Restart aggregation with keyword-only fallback
cargo run --release -- aggregate --skip-llm

Issue: "Release gate" validation fails

Check output integrity:

cargo run --release -- release-gate

This validates:

All SKILL.md files were processed
No orphaned or missing references in routing.csv
Deduplication stats match cache state

If failures reported, re-run aggregation:

rm -rf skills-aggregated/
cargo run --release -- aggregate

📈 Performance Characteristics

Operation	Time	Dependencies
First aggregate (100+ repos, 8000+ skills)	10-20 min	Network speed, CPU count, LLM latency
Incremental aggregate (repos already cached)	2-5 min	LLM classification speed (can skip with `--skip-llm`)
Sync to tools (10 tools, all hubs)	30-60 sec	Disk I/O, junction creation speed
TUI startup	<1 sec	Manifest parsing
LLM classification (8000 skills)	3-8 min	Batch size, LLM throughput

Optimization Tips:

Use PARALLEL_JOBS=auto for optimal CPU utilization
Set LLM_BATCH_SIZE=100 for faster LLM processing (requires more GPU/API quota)
Run on an SSD for 2-3x faster repository cloning
Use shallow clones (default) to reduce disk bandwidth

🤝 Contributing

Development Setup

# Clone and build
git clone <this-repo>
cd skills-bank
cargo build

# Run tests
cargo test

# Format code
cargo fmt

# Check for issues
cargo clippy

Reporting Issues

When reporting bugs, include:

Output of cargo run --release -- doctor
Contents of .skills-bank-cli-config.json (redact sensitive URLs if needed)
Error message and stack trace (if any)
Steps to reproduce

Extending Classification

To add new domain keywords or refine sub-hub routing:

Edit src/classify.rs → CONFLICT_RESOLUTION table or keyword rules
Add test cases in tests/
Run cargo test and cargo run --release -- aggregate
Submit PR with classification examples

📄 License

MIT — See package.json for details.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
bin		bin
config		config
docs		docs
scratch		scratch
src		src
tests		tests
.env-example		.env-example
.gitignore		.gitignore
.skills-bank-cli-config.json		.skills-bank-cli-config.json
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
dist-workspace.toml		dist-workspace.toml
hub-manifests.csv		hub-manifests.csv
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

skills-bank

� Prerequisites

�📖 Overview

Core Design Principles

🚀 Quick Start

1. Build the CLI

2. Run the Full Pipeline

Interactive Setup (First Time)

🎮 Commands Reference

Core Pipeline Commands

Example Workflows

📁 Project Structure

Source Code & Configuration

Generated Outputs (After Aggregation)

Repository Cache

Testing & Documentation

📁 Repository Management

Cloning & Caching

Legacy Repository Cleanup

⚙️ Output Files & Configuration

🌐 Environment Variables

🎯 Tool Integration Targets

🏗️ Classification Architecture

🔍 Classification Improvements (v2.0+)

1. Repository Name Extraction (Substring Matching)

2. Sub-Hub Conflict Resolution

3. Confidence Boost for Path-Based Inference

📊 Example Classification Flow

🔧 Troubleshooting

Issue: Skills not aggregating or taking too long

Issue: Sync failing with "junction or symlink" errors

Issue: LLM classification appears stuck

Issue: "Release gate" validation fails

📈 Performance Characteristics

🤝 Contributing

Development Setup

Reporting Issues

Extending Classification

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages