test(skills): integration tests for real-world skills.sh scripts#292
Merged
test(skills): integration tests for real-world skills.sh scripts#292
Conversation
Analyze skills from skills.sh leaderboard across 12 repos to assess bash feature coverage. Key findings: - 66% are pure markdown (no scripts needed) - 97%+ of bash features used are supported by bashkit - Main gap is external binaries (LibreOffice, az CLI, etc.) - Only missing builtins: base64, curl -F multipart https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
Key discoveries: - 250 leaderboard entries map to ~80 unique skills from ~25 repos (google-stitch: 72 entries → 6 skills; baoyu: 75 entries → 16 skills) - 63% pure markdown, 18% bash scripts, 14% TypeScript, 15% Python - Bash feature coverage: effectively 100% for all scripts observed - New pattern: TypeScript via `npx -y bun` (baoyu-skills, 97 .ts files) - New pattern: SKILL.md lifecycle hooks with bash (planning-with-files) - Missing builtins: base64, curl -F multipart, sed -i https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
sed -i is fully implemented (sed.rs:216-217, all 75 tests pass). Removed from gaps list. Added note clarifying this. Issues filed: - #287: base64 builtin missing - #288: curl -F multipart support missing https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
Extract 10 bash scripts from top skills.sh repos and run them through bashkit parser + interpreter with stubbed external binaries (az, helm, npm, curl, python3). Results: 10/10 parse, 6/10 execute, 4 ignored with tracked bugs. Parse tests verify every fixture parses cleanly. Execution tests use custom builtins (BashBuilder::builtin) to mock az CLI, helm, npm, curl etc. so we test bash feature coverage without real infrastructure. Bugs found and filed: - #289: backslash line continuation fails in some parser contexts - #290: while/case arg parsing loop hits MaxLoopIterations - #291: [ -f ] doesn't see VFS files after cd in script execution Scripts sourced from: - microsoft/github-copilot-for-azure (azure_*.sh) - vercel-labs/agent-skills (vercel_deploy.sh) - google-labs-code/stitch-skills (stitch_*.sh) - obra/superpowers (find_polluter.sh) - wshobson/agents (helm_validate_chart.sh) - giuseppe-trisciuoglio/developer-kit (jwt_test_setup.sh) https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
Drop specs/015-skills-analysis.md (pure analysis doc). The value lives in the tests themselves — skills_tests.rs now has a full source table linking each fixture to its upstream repo. Also: fix clippy unused import, apply cargo fmt. https://claude.ai/code/session_01CVF1zwHgALVKQnDrTBie9o
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What's tested
declare -A,${!MAP[@]}, jq pipes,set -euo pipefailcommand -v,grep -q,awk, echo -e ANSI${var: -3}substring,local,trap EXIT,curl -w$?checkdirname,mkdir -p,command -v,statfor/$(find),$(( )), `wcBugs found
[ -f ]doesn't see VFS files after cd in script executionTest plan
cargo test --test skills_tests— 16 pass, 0 fail, 4 ignoredcargo test --test spec_tests— existing tests unaffectedcargo clippy -- -D warningscleancargo fmt --checkclean