Releases: Zandereins/schliff
v7.2.0 — security hardening + scoring robustness
[7.2.0] - 2026-04-24
Security
- Prompt-injection hardening in
schliff evolve: user-authored skill
content is wrapped in explicit XML tags with a per-call random nonce
before being passed to LLM prompts. Earlier versions fed raw content
into the meta-prompt, letting a crafted SKILL.md inject directives.
A sanitizer rejects XML-tag injection attempts and an explicit
<user_content>…</user_content>boundary isolates user input. - CLI error-handling with no traceback leaks:
schliff scoreon a
directory or oversized file no longer leaks a raw Python traceback.
read_skill_saferejects directories explicitly with a clear
ValueError;cli.main()wraps handler dispatch in one
(OSError, ValueError)try/except that renders a shortError: …
line on stderr and exits 1.
Fixed
- Scoring robustness across all dimensions: all five scorers that
consume user-authored eval-suite JSON (edges,triggers,quality,
runtime,coherence) now guard their list-valued fields with
isinstance(…, list)checks. Pre-fix, a truthy non-list
({"edge_cases": 42},{"triggers": "abc"}) crashed the scorer with
TypeErrorfromlen()orAttributeErrorfrom.get()on a string
character. Post-fix each scorer returns its standard sentinel (score
−1 / bonus 0) on malformed input; innerassertionsandtest_case
items are filtered via_assertion_dictshelpers. score_edgescategory guard: malformedcategoryentries (ints,
nulls, lists) no longer crash.startswith()during known-category
coverage.install.shandanalyze-skill.shPOSIX portability: replaced
GNU-only\s/\Singrep -Epatterns with POSIX character classes
[[:space:]]/[^[:space:]]. On older macOS (classic BSD grep),
the installer previously printed "Schliff v unknown" andanalyze-skill.sh
missed name / example detection.- Score-to-grade consistency: playground, evolve, GitHub Action, and
CI badges now share the canonical E-band (35–49) from
terminal_art.score_to_grade; previously each surface drifted.
Changed
- E-band grade now emitted in badge/CI output. Consumers that parse
a grade field with a closed set of{S, A, B, C, D, F}must now accept
Eas well. Breaking for JSON consumers that did exhaustive grade
switching; non-breaking for score-based consumers. install.shreads VERSION frompyproject.tomlat install time
instead of carrying a hard-coded literal. Release process simplified
accordingly inRELEASING.md.- EXCLUDED_DIRS centralized in
shared.py;doctorand related
scanners share one canonical list. - Scorer signatures cleaned up: unused
**kwparameters and dead
ImportErrorfallbacks removed from several scoring / pattern modules. verifyusesterminal_art.score_to_gradeinstead of a local
duplicate, keeping grade mapping in one place.
Added
RELEASING.mdpre-release checklist documents the full release
procedure (version bump, CHANGELOG draft, tag, publish, badge cache-bust).- Cross-platform CI expansion: GitHub Actions matrix covers Python
3.9–3.13 on ubuntu-latest and adds a dedicatedtest-macosjob gating
badge generation / report publishing. - ~100 new regression tests covering non-list eval-suite fields, CLI
error-handling paths, BSD-grep portability of shipped shell scripts,
prompt-injection sanitization, UTF edge cases, runtime enabled path,
andscore_edgeserror branches. setuptoolsupper bound pinned inpyproject.tomlto avoid build
breakage from future major releases; test files excluded from the wheel.
Test coverage
- Total: 1017 → 1117 (+100) / 0 skipped / 0 failed
- New files:
test_scoring_type_guards.py,test_cli_error_handling.py,
test_install_version.py; expandedtest_scoring_edges_malformed.py
and newtest_evolve_prompt_injection.py/test_evolve_sanitize.py.
v7.1.1
Patch release. Ships the _RE_ACTIONABLE_LINES pattern fix to PyPI so users installing via pip install schliff receive the corrected scorer.
What's fixed
Bullet-marker support in actionable-line patterns
_RE_ACTIONABLE_LINES and three sibling patterns (_RE_RUN_PATTERN, _RE_DIFF_SIGNAL, _RE_IMPERATIVE_INSTRUCTION) previously only matched numbered list prefixes (1. Run X) or bare imperatives. Markdown bullets — - Run X, * Use Y, + Install Z — fell through silently. A shared _LIST_MARKER alternation now applies to all four.
Regression guards: 10 new test cases in TestListMarkerSupport covering supported markers, bare-imperative regression, nested indentation, word-boundary guards, and marker-without-verb cases. Full suite: 1017 passed (up from 1007).
Real-world impact
Measured against the root CLAUDE.md merged into modelcontextprotocol/servers (PR #3733):
| Dimension | Before | After |
|---|---|---|
| efficiency | 57 | 64 (+7) |
| composite | 59.2 | 61.0 (+1.8) |
Context
This release ships the fix that was publicly audited in Scoring My Own MCP Contribution and closed in The blindspot, fixed. The pattern-fix landed in #29; this release is #30.
Install
pip install --upgrade schliff
schliff version
# schliff 7.1.1Full diff: v7.1.0...v7.1.1
v7.1.0 — Report, Drift, Sync, Track, Web Playground
What's New
5 new CLI commands for deeper skill file analysis:
| Command | Purpose |
|---|---|
schliff report <path> |
Markdown quality report (--gist for shareable link) |
schliff drift --repo <dir> |
Find stale paths, scripts, and make targets |
schliff sync <dir> |
Cross-file consistency: contradictions, gaps, redundancies |
schliff track <path> |
Score history over time with sparkline + regression detection |
schliff score --tokens |
Section-by-section token breakdown with format budgets |
Web infrastructure:
- Interactive Playground — try schliff in the browser before installing
- Community Leaderboard scaffold with serverless API
Doctor upgrade: now discovers all instruction files (CLAUDE.md, .cursorrules, AGENTS.md) and runs drift analysis on them.
Stats
- 732 tests (140 new, up from 592)
- 6 audit iterations across 3 parallel worktree branches
- Zero dependencies — still Python 3.9+ stdlib only
Security
- Path traversal prevention in drift detector
- CSP headers on web properties
- Control character / bidi override rejection
- Temp file cleanup, version field bounds
Install / Upgrade
pip install --upgrade schliffFull Changelog
See CHANGELOG.md for the complete list.
v7.0.0 — Multi-format, security, compare, suggest, --url
What's New
Score any agent instruction file — not just SKILL.md. Schliff v7.0 supports CLAUDE.md, .cursorrules, AGENTS.md, and generic markdown out of the box.
New Features
- Multi-format support — Auto-detection from filename,
--formatoverride. Content normalization means zero scorer changes, zero regression risk. - Security dimension — 10 regex patterns, 6 categories (injection, exfiltration, obfuscation, dangerous commands, overpermission, missing boundaries). Negation-aware matching, meta-discourse false-positive mitigation. Opt-in:
--security schliff compare— Side-by-side quality comparison with dimension deltasschliff suggest— Ranked actionable fixes with estimated score impactschliff score --url— Score remote files from GitHub (HTTPS-only, SSRF protection)- Web Playground — Browser-based scorer (playground/)
- GitHub Action — PR comments with score tables, published to Marketplace
Security Hardening
- SSRF redirect protection, YAML injection prevention, path traversal guard
- Shell injection prevention, Content-Length guard, JSONDecodeError handling
- 60+ agent reviews across 3 feature branches (4 rounds × 6 specialized agents each)
Stats
- 592 tests (up from 540), all green
- Self-score: 99.0 [S] (zero regression from v6.3.0)
- 3 parallel worktrees, 22 commits, ~2,200 LOC added
Install
pip install --upgrade schliff
schliff demo🤖 Generated with Claude Code
v6.3.0 — schliff diff, scoring bug fixes, Show HN README
What's New
schliff diff command
Compare skill scores between git commits. Shows per-dimension deltas with signal/noise analysis.
schliff diff SKILL.md # vs previous commit
schliff diff SKILL.md --ref main # vs any ref
schliff diff SKILL.md --json # machine-readableSecurity hardened: ref validation, path containment check, size limit guard.
3 scoring bug fixes
- triggers: precision/recall reported 100% with zero predictions → now 0%
- clarity: ambiguous pronoun detection skipped first line → fixed
- efficiency: score returned float instead of int → consistent with all other dimensions
README overhaul for Show HN
- Context bridge explaining Claude Code for non-users
- Commands table split into CLI (standalone) vs Claude Code (require integration)
- "Where Schliff fits" ecosystem diagram moved to Quick Start
- Honest self-score framing, anti-gaming benchmark context
- Test counts with links (540 unit + 99 integration)
DX
schliffwithout args now shows quick-start hints
Tests
- 85 new tests (18 cmd_diff, 33 composite weights, 34 diff scoring)
- Total: 540 unit + 99 integration = 639
Full changelog: CHANGELOG.md
Install: pip install schliff==6.3.0
v6.2.0 — Pre-launch hardening
Schliff v6.2.0 — the launch release. Deterministic linter for Claude Code SKILL.md files.
Try it
pip install schliff
schliff demo # see it in action
schliff doctor # scan your installed skillsHighlights
schliff demo— try it in 5 seconds, no skill files neededschliff badge— generate a score badge for your README- Pre-commit hook — automatic quality gate on every commit
- Anti-gaming — 6/6 gaming patterns detected in benchmark suite
- 500+ tests, zero dependencies, Python 3.9+ stdlib only
Case Study
@wan-huiyan improved agent-review-panel from 64 [D] to 85.6 [A] — 75% token reduction, A/B validated.
What's New
Added
schliff democommand — score a built-in bad skill to see schliff in action instantlyschliff badge <path>command — generate copy-paste markdown badge- Pre-commit hook support (
.pre-commit-hooks.yaml) - Doctor
--verboseflag withreferences/extraction recommendations - Community case study
Fixed
- Security: ReDoS in
_RE_ERROR_BEHAVIOR, OOM-safe eval-suite loading, symlink rejection - Scoring: clarity auto-injection with custom weights,
no_real_examplessuppression fix
First time? pip install schliff && schliff demo
Already have skills? schliff doctor scans them all.
Share your results: Show Your Score discussion
v6.1.0 — Launch Repositioning
The deterministic skill linter for Claude Code
Schliff v6.1.0 repositions the project as a measurement tool first — the Ruff for SKILL.md files.
New
schliff verify— CI gate with exit codes 0/1/2,--min-score,--regression, history tracking- Anti-gaming benchmark — 6/6 gaming attempts detected (
benchmarks/anti-gaming/) - Repetition detection in efficiency scorer (copy-paste examples: 94 -> 43)
- Screenshot-ready
schliff scoreoutput with per-dimension bars and status words
Fixed
- Structural markers (code fences, headers, rules) excluded from repetition count
- Code block content excluded from repetition counting
verifyhandles corrupted history and missing files gracefully
Docs
- README rewritten: competitive positioning, comparison table, real-world results, architecture diagram
- YAML issue templates replace markdown templates
- SKILL.md and pyproject.toml metadata updated for discoverability
540+ tests. Zero dependencies. pip install schliff.
v6.0.1 — Pre-Launch Audit
Comprehensive pre-launch audit: 40+ bugs fixed, 443 tests, 55 security fixes. See full changelog.
v6.0.0 — Schliff
Schliff v6.0.0 — The finishing cut for Claude Code skills
Major release: Rebrand from SkillForge to Schliff (German: "the finishing cut").
New Features
- Clarity as default dimension — contradictions, vague references, ambiguity detected by default (5% weight)
- Token cost estimation — Doctor shows per-skill token cost + fleet total
- GitHub Action —
Zandereins/schliff@v6scores skills in CI, comments on PRs, blocks merges - pip CLI —
schliff score SKILL.mdworks without Claude Code - Actionable Doctor — copy-paste commands with full skill paths
Scoring Improvements
- Trigger confidence cap: eval suites with <8 triggers capped at score 60
- Context-aware contradiction detection (verb+object+modifier tuples)
- Anti-gaming: empty headers don't count, efficiency signal caps, trigger threshold floor
- Missing dimension warnings always shown
Quality
- 123 unit tests, 99 integration tests, 20 self-tests
- 40 security fixes (shell injection, prompt injection, ReDoS, supply chain)
- Self-score: 95.4/100 [S]
Breaking Changes
--clarityflag removed (clarity is now always-on; use--no-clarityto opt out)- All
/skillforge:*commands renamed to/schliff:*
Full Changelog: v5.1.0...v6.0.0
v5.1.0 — Honest Scoring, Beam Search, LSH, Doctor
What's New in v5.1
Added
- Honest Scoring — "Structural Score" label everywhere, replacing misleading "Quality Score"
- Stemming Tokenizer — suffix-stripping replaces fixed synonym tables for better keyword matching
- Beam Search — top-3 exploration instead of greedy top-1 from iteration 4 onward
- EMA Plateau Detection — Exponential Moving Average replaces fixed-window ROI stopping
- MinHash + LSH — O(n) mesh analysis instead of O(n²) for 50+ skills
- Context-aware Patches — generates meaningful descriptions instead of TODOs
- Doctor Command (
doctor.py) — scans all installed skills, shows health summary with grades - Dimension Guard — prevents patches that tank a single dimension by >15 points
- Coherence Check — instruction-assertion alignment as quality bonus
- 40+ Pre-compiled Regex — performance optimization across the scorer
- Public Cache API —
invalidate_cache()replaces direct_file_cache.pop() - Underscore Alias Modules for Python import compatibility
- 21 Runtime Assertions — response_excludes assertions across 4 test cases for validated scoring
Fixed
- State truncation bug in auto-improve loop
- EMA indexing off-by-one in plateau detection
- Deterministic hash for MinHash reproducibility
- 2 trigger false positives eliminated (41/41 = 100%)
Stats
| Metric | Value |
|---|---|
| Structural Score | 99.9/100 [S] |
| Runtime Score | 100/100 [S] (with --runtime) |
| Composite (all 7 dims) | 99.9/100 [S] |
| Tests | 99/99 passing (87 integration + 12 self-tests) |
| Runtime Assertions | 21/21 passing |
| Security | 27 fixes from 15-agent deep audit, 0 regressions |
Dimension Breakdown
structure ██████████ 100/100
triggers ██████████ 100/100
quality ██████████ 100/100
edges ██████████ 100/100
efficiency █████████░ 93/100
composability ██████████ 100/100
runtime ██████████ 100/100
Quick Start
git clone https://github.com/Zandereins/skillforge.git
cp -r skillforge/skills/skillforge ~/.claude/skills/
cp -r skillforge/commands/skillforge ~/.claude/commands/Full docs: README