Skip to content

abdulsamed1/AI-skills-bank

Repository files navigation

skills-bank

High-performance skill aggregation, classification & routing platform for AI agents.

Rust License CLI TUI


� Prerequisites

  • Rust 1.70+ (Install)
  • Git (for repository cloning)
  • ~2GB disk space (for aggregated skills cache)

�📖 Overview

skills-bank aggregates skills (workflows, tasks, specialized agents) from 100+ distributed repositories and provides a unified routing system for AI agents to discover, load, and invoke them efficiently.

Core Design Principles

  • Source-of-Truth Loading: Agents load canonical SKILL.md files directly from source repositories, not from catalogs. This eliminates hallucination risks and optimizes token usage.
  • Hybrid Classification: A dual-stage pipeline combines fast keyword rules (Step A) with LLM-powered semantic classification (Step B) to route skills into 12 domain hubs and 40+ sub-hubs.
  • Smart Deduplication: Skills are deduplicated by name OR description — catching both exact collisions and cross-repo clones with different names but identical content.
  • Multi-Tool Support: Skills sync to major AI tools including GitHub Copilot, Claude-code, free-code (claude-code), Hermes, Cursor, Gemini, Antigravity, OpenCode, Codex, and Windsurf.
  • Token Efficiency: Load minimal metadata first, then source files on-demand—not batch-loading entire catalogs.
  • Interactive TUI: A rich terminal UI (powered by Ratatui) provides real-time dashboard, skill explorer, and pipeline monitoring.

🚀 Quick Start

1. Build the CLI

cd skills-bank/
cargo build --release

2. Run the Full Pipeline

# Interactive setup (first run)
cargo run --release

# Or run all steps in sequence
cargo run --release -- run

# Launch the interactive TUI
cargo run --release -- tui

Interactive Setup (First Time)

cargo run --release -- setup

Launches an interactive wizard to configure:

  • Where skills should be synced (global, workspace, or both)
  • Which AI tools to sync to
  • Repository URLs to clone and aggregate
  • Excluded categories

🎮 Commands Reference

Core Pipeline Commands

Command Purpose When to Use
aggregate Collect, deduplicate, classify, and route skills from configured repositories to skills-aggregated/ First run or when repositories change
sync Distribute aggregated skills to configured AI tool directories After aggregation completes
run Execute the full pipeline (aggregate → sync) in sequence Daily updates or automated workflows
setup Configure sync targets, repositories, and exclusions interactively Initial setup only
add-repo <URL> Add a new skill repository to the configuration When onboarding new sources
doctor Validate installation and report repository state Troubleshooting or pre-cleanup inspection
release-gate Validate aggregation output integrity Before releases or production sync
cleanup-legacy-duplicates Remove legacy repository folders from src/ or repos/ (only if matching lib/ exists) Migration from older versions
tui Launch interactive terminal dashboard with skill explorer and statistics Real-time monitoring

Example Workflows

First-time setup:

cargo run --release -- setup
cargo run --release -- run

Daily aggregation with monitoring:

cargo run --release -- aggregate  # with progress bar
cargo run --release -- tui         # monitor in background

Validate before production sync:

cargo run --release -- doctor
cargo run --release -- release-gate
cargo run --release -- sync

📁 Project Structure

Source Code & Configuration

  • src/ — Rust source code: TUI, fetcher, aggregator, sync engine, classification logic
  • Cargo.toml — Rust manifest (dependencies, metadata, build targets)
  • .skills-bank-cli-config.json — User configuration file (generated by setup, contains sync targets and repository URLs)
  • .env-example — Environment variable template

Generated Outputs (After Aggregation)

  • skills-aggregated/ — Single source of truth containing:
    • routing.csv — Skill-to-hub/sub-hub routing table
    • subhub-index.json — Hub and sub-hub registry
    • hub-manifests.csv — Master index of all skills
    • .skill-lock.json — Aggregation metadata and timestamps
    • Per-hub directories with skills-manifest.json files

Repository Cache

  • lib/ — Canonical cache for cloned skill repositories (populated by aggregate command)

Testing & Documentation

  • tests/ — Integration test suite for pipeline and TUI
  • archive/ — Legacy PowerShell scripts (original PoC phase)
  • package.json — Node.js manifest for npx distribution
  • readme.md — This file

📁 Repository Management

Cloning & Caching

Cache Location: lib/ (not src/) — This is the canonical directory for all cloned repositories.

Clone Strategy:

  • First clone: Shallow clone with git clone --depth 1 --single-branch --no-tags (faster, smaller disk footprint)
  • Subsequent runs: git pull in existing directories (avoid re-cloning)
  • Deduplication: Normalized remote URLs and repository names prevent duplicate clones

Speed Optimization:

  • Parallel cloning via configurable PARALLEL_JOBS
  • Shallow clones reduce disk I/O by ~80% vs. full clones
  • Incremental updates via git pull

Legacy Repository Cleanup

If you have repositories in older locations (src/ or repos/), migrate them:

# Inspect current state
cargo run --release -- doctor

# Remove legacy folders (safe: only deletes if matching lib/ exists and Git remote matches)
cargo run --release -- cleanup-legacy-duplicates

⚠️ Warning: This is destructive. Always run doctor first to inspect repository state.

⚙️ Output Files & Configuration

Generated during aggregation into skills-aggregated/:

File Purpose
routing.csv Skill-to-hub/sub-hub mappings (name, hub, sub-hub, src_path)
subhub-index.json Complete hub and sub-hub registry
hub-manifests.csv Master index of all skills across all hubs
.skill-lock.json Aggregation metadata (timestamps, repo revisions, dedup stats)
[hub]/[sub-hub]/skills-manifest.json Per-sub-hub skill metadata and LLM classification triggers

These files are used by agents and the TUI for discovery and routing.


🌐 Environment Variables

Copy .env-example to .env to override defaults:

cp .env-example .env

Common variables:

  • SKILLS_BANK_CONFIG — Path to CLI config file (default: .skills-bank-cli-config.json)
  • SKILLS_BANK_CACHE — Cache directory for repositories (default: lib/)
  • SKILLS_BANK_OUTPUT — Output directory for aggregated skills (default: skills-aggregated/)
  • LLM_BATCH_SIZE — Batch size for LLM classification (default: 50)
  • PARALLEL_JOBS — Number of parallel aggregation workers (default: auto-detect CPU count)

See .env-example for all available options.


🎯 Tool Integration Targets

Sync skills to any of these destinations:

Tool Project Global
Claude .claude/skills/ ~/.claude/skills/
free-code (claude-code) .free-code-config/skills/ ~/.free-code-config/skills/
Hermes .hermes/skills/ ~/.hermes/skills/
Code (Codex) .agents/skills/ ~/.agents/skills/
GitHub Copilot .github/skills/ ~/.copilot/skills/
Cursor .cursor/skills/ ~/.cursor/skills/
Gemini .gemini/skills/ ~/.gemini/skills/
Antigravity .agent/skills/ ~/.gemini/antigravity/skills/
OpenCode .opencode/skills/ ~/.config/opencode/skills/
Windsurf .windsurf/skills/ ~/.codeium/windsurf/skills/

🏗️ Classification Architecture

The aggregation pipeline processes 8000+ SKILL.md files through a multi-stage classification system:

 SKILL.md files (8000+)
        │
        ▼
 ┌──────────────┐
 │  YAML Parse   │  Extract name, description, triggers
 └──────┬───────┘
        │
        ▼
 ┌──────────────┐
 │  Keyword      │  Fast token-based routing to hub/sub-hub
 │  Rules        │  (fallback if LLM unavailable)
 └──────┬───────┘
        │
        ▼
 ┌──────────────┐
 │  Dedup        │  Name OR Description HashSet
 │  (two-key)    │  Catches cross-repo clones
 └──────┬───────┘
        │
        ▼
 ┌──────────────────────────────────┐
 │  Hybrid Exclusion + LLM Classify │
 │  Step A: Keyword pre-filter      │
 │  Step B: LLM semantic classify   │
 │         (can return "excluded")  │
 └──────┬───────────────────────────┘
        │
        ▼
 ┌──────────────┐
 │  Output       │  routing.csv, per-hub manifests,
 │  Artifacts    │  skills-index.json
 └──────────────┘

🔍 Classification Improvements (v2.0+)

The keyword-based classification system includes three critical enhancements to eliminate false negatives and resolve sub-hub conflicts:

1. Repository Name Extraction (Substring Matching)

Problem: Repository names like mukul975-anthropic-cybersecurity-skills were not being matched because the system used exact token matching (e.g., only matching the token "security", not the full repo name).

Solution: Introduced infer_hub_from_repo_name() function that:

  • Extracts the repository directory name from the path (the segment right after lib/ or src/)
  • Uses substring matching to catch domain signals (e.g., "cybersecurity-skills" → matches "security")
  • Runs before other inference logic (highest priority)
  • Supports domain keywords:
    • Security: security, cybersecurity, pentest, vulnerability, vibesec, bluebook
    • AI: prompt, agent-skill, llm, ai-skills
    • Mobile (iOS): swiftui, ios-, -ios, swift-patterns, apple-hig, app-store
    • Mobile (Android): android, kotlin
    • Frontend/UI: ui-ux, ui-skills
    • Testing/QA: playwright, testdino

Confidence Score: 98% (near-deterministic, reflects author intent)

2. Sub-Hub Conflict Resolution

Problem: When a skill matched multiple sub-hubs (e.g., python AND security simultaneously), language hubs often won due to their anchor keywords, defeating domain-specialist classification.

Solution: Introduced conflict resolution table (CONFLICT_RESOLUTION) that:

  • Defines precedence rules when multiple sub-hubs match: (losing_hub, losing_sub_hub, winning_hub, winning_sub_hub)
  • Ensures domain specialists always win over languages:
    • security > python | javascript | typescript | rust | golang | java
    • testing-qa > python | javascript | typescript | rust
    • code-review > python | javascript
  • Applied in resolve_conflict() function when multiple candidates score within 5 points of the top score
  • Fallback: hub priority ordering if no explicit rule applies

3. Confidence Boost for Path-Based Inference

Problem: Repository name signals (inferred from path) were scored 95%, allowing lower-confidence LLM results (80%) to potentially override them.

Solution: Raised the confidence score for path-based inference from 95 → 98%

  • Score 98 is now treated as near-deterministic (same tier as explicit canonicalize_assignment logic at 100)
  • Only scores ≥ 100 can override it
  • Prevents low-confidence LLM results from contradicting repository metadata

📊 Example Classification Flow

For a skill in lib/mukul975-anthropic-cybersecurity-skills/:

1. apply_rules() called
   ↓
2. canonicalize_assignment() → no match (0% confidence)
   ↓
3. infer_from_path() called
   ├─ infer_hub_from_repo_name() extracts "mukul975-anthropic-cybersecurity-skills"
   ├─ Finds substring match: "cybersecurity"
   └─ Returns ("code-quality", "security") with 98% confidence
   ↓
4. ✓ Final assignment: code-quality / security
   ✗ LLM classification skipped (98% > 80% threshold)

🔧 Troubleshooting

Issue: Skills not aggregating or taking too long

Check repository state:

cargo run --release -- doctor

This validates all repositories, checks Git remotes, and reports cache status.

Increase parallelism:

export PARALLEL_JOBS=16
cargo run --release -- aggregate

Issue: Sync failing with "junction or symlink" errors

Cause: Existing junctions in sync target directories.

Solution: The sync command automatically skips existing junctions. If conflicts persist:

# Inspect sync targets
dir ~/.claude/skills  # Windows
ls ~/.claude/skills   # macOS/Linux

# Remove conflicting junctions/symlinks manually
rmdir /s ~/.claude/skills\[hub-name]  # Windows
rm -rf ~/.claude/skills/[hub-name]    # macOS/Linux

# Retry sync
cargo run --release -- sync

Issue: LLM classification appears stuck

Check TUI progress:

cargo run --release -- tui

The TUI shows real-time LLM batch progress. If stuck for >5 minutes:

# Check if LLM service (Ollama/Claude) is running
# Restart aggregation with keyword-only fallback
cargo run --release -- aggregate --skip-llm

Issue: "Release gate" validation fails

Check output integrity:

cargo run --release -- release-gate

This validates:

  • All SKILL.md files were processed
  • No orphaned or missing references in routing.csv
  • Deduplication stats match cache state

If failures reported, re-run aggregation:

rm -rf skills-aggregated/
cargo run --release -- aggregate

📈 Performance Characteristics

Operation Time Dependencies
First aggregate (100+ repos, 8000+ skills) 10-20 min Network speed, CPU count, LLM latency
Incremental aggregate (repos already cached) 2-5 min LLM classification speed (can skip with --skip-llm)
Sync to tools (10 tools, all hubs) 30-60 sec Disk I/O, junction creation speed
TUI startup <1 sec Manifest parsing
LLM classification (8000 skills) 3-8 min Batch size, LLM throughput

Optimization Tips:

  • Use PARALLEL_JOBS=auto for optimal CPU utilization
  • Set LLM_BATCH_SIZE=100 for faster LLM processing (requires more GPU/API quota)
  • Run on an SSD for 2-3x faster repository cloning
  • Use shallow clones (default) to reduce disk bandwidth

🤝 Contributing

Development Setup

# Clone and build
git clone <this-repo>
cd skills-bank
cargo build

# Run tests
cargo test

# Format code
cargo fmt

# Check for issues
cargo clippy

Reporting Issues

When reporting bugs, include:

  1. Output of cargo run --release -- doctor
  2. Contents of .skills-bank-cli-config.json (redact sensitive URLs if needed)
  3. Error message and stack trace (if any)
  4. Steps to reproduce

Extending Classification

To add new domain keywords or refine sub-hub routing:

  1. Edit src/classify.rsCONFLICT_RESOLUTION table or keyword rules
  2. Add test cases in tests/
  3. Run cargo test and cargo run --release -- aggregate
  4. Submit PR with classification examples

📄 License

MIT — See package.json for details.

About

AI Skills Bank is a unified, multi-tool platform designed to aggregate, manage, and route AI skills across various workflows and AI assistants (such as Antigravity, Claude Code, Cursor, and Copilot).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors