Skip to content

test(skills): integration tests for real-world skills.sh scripts#292

Merged
chaliy merged 5 commits intomainfrom
claude/analyze-skills-bash-MvtoF
Feb 26, 2026
Merged

test(skills): integration tests for real-world skills.sh scripts#292
chaliy merged 5 commits intomainfrom
claude/analyze-skills-bash-MvtoF

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented Feb 26, 2026

Summary

  • Add 10 real bash scripts from top skills.sh repos as test fixtures
  • Parse + execute them through bashkit with stubbed external binaries (az, helm, npm, curl, python3)
  • 10/10 parse, 6/10 execute, 4 ignored with tracked bugs

What's tested

Script Source Bash features
azure_discover_rank.sh microsoft/github-copilot-for-azure declare -A, ${!MAP[@]}, jq pipes, set -euo pipefail
helm_validate_chart.sh wshobson/agents functions, command -v, grep -q, awk, echo -e ANSI
jwt_test_setup.sh giuseppe-trisciuoglio/developer-kit ${var: -3} substring, local, trap EXIT, curl -w
stitch_fetch.sh nichochar/stitch-skills curl wrapper, $? check
stitch_download_asset.sh nichochar/stitch-skills dirname, mkdir -p, command -v, stat
find_polluter.sh nichochar/superpowers for/$(find), $(( )), `wc

Bugs found

Test plan

  • cargo test --test skills_tests — 16 pass, 0 fail, 4 ignored
  • cargo test --test spec_tests — existing tests unaffected
  • cargo clippy -- -D warnings clean
  • cargo fmt --check clean

Analyze skills from skills.sh leaderboard across 12 repos to assess
bash feature coverage. Key findings:
- 66% are pure markdown (no scripts needed)
- 97%+ of bash features used are supported by bashkit
- Main gap is external binaries (LibreOffice, az CLI, etc.)
- Only missing builtins: base64, curl -F multipart

https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
Key discoveries:
- 250 leaderboard entries map to ~80 unique skills from ~25 repos
  (google-stitch: 72 entries → 6 skills; baoyu: 75 entries → 16 skills)
- 63% pure markdown, 18% bash scripts, 14% TypeScript, 15% Python
- Bash feature coverage: effectively 100% for all scripts observed
- New pattern: TypeScript via `npx -y bun` (baoyu-skills, 97 .ts files)
- New pattern: SKILL.md lifecycle hooks with bash (planning-with-files)
- Missing builtins: base64, curl -F multipart, sed -i

https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
sed -i is fully implemented (sed.rs:216-217, all 75 tests pass).
Removed from gaps list. Added note clarifying this.

Issues filed:
- #287: base64 builtin missing
- #288: curl -F multipart support missing

https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
Extract 10 bash scripts from top skills.sh repos and run them through
bashkit parser + interpreter with stubbed external binaries (az, helm,
npm, curl, python3).

Results: 10/10 parse, 6/10 execute, 4 ignored with tracked bugs.

Parse tests verify every fixture parses cleanly. Execution tests use
custom builtins (BashBuilder::builtin) to mock az CLI, helm, npm, curl
etc. so we test bash feature coverage without real infrastructure.

Bugs found and filed:
- #289: backslash line continuation fails in some parser contexts
- #290: while/case arg parsing loop hits MaxLoopIterations
- #291: [ -f ] doesn't see VFS files after cd in script execution

Scripts sourced from:
- microsoft/github-copilot-for-azure (azure_*.sh)
- vercel-labs/agent-skills (vercel_deploy.sh)
- google-labs-code/stitch-skills (stitch_*.sh)
- obra/superpowers (find_polluter.sh)
- wshobson/agents (helm_validate_chart.sh)
- giuseppe-trisciuoglio/developer-kit (jwt_test_setup.sh)

https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
Drop specs/015-skills-analysis.md (pure analysis doc). The value lives
in the tests themselves — skills_tests.rs now has a full source table
linking each fixture to its upstream repo.

Also: fix clippy unused import, apply cargo fmt.

https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
@chaliy chaliy merged commit d770547 into main Feb 26, 2026
16 checks passed
@chaliy chaliy deleted the claude/analyze-skills-bash-MvtoF branch February 26, 2026 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants