Skip to content

Releases: Zandereins/schliff

v7.2.0 — security hardening + scoring robustness

24 Apr 18:30
d63161b

Choose a tag to compare

[7.2.0] - 2026-04-24

Security

  • Prompt-injection hardening in schliff evolve: user-authored skill
    content is wrapped in explicit XML tags with a per-call random nonce
    before being passed to LLM prompts. Earlier versions fed raw content
    into the meta-prompt, letting a crafted SKILL.md inject directives.
    A sanitizer rejects XML-tag injection attempts and an explicit
    <user_content>…</user_content> boundary isolates user input.
  • CLI error-handling with no traceback leaks: schliff score on a
    directory or oversized file no longer leaks a raw Python traceback.
    read_skill_safe rejects directories explicitly with a clear
    ValueError; cli.main() wraps handler dispatch in one
    (OSError, ValueError) try/except that renders a short Error: …
    line on stderr and exits 1.

Fixed

  • Scoring robustness across all dimensions: all five scorers that
    consume user-authored eval-suite JSON (edges, triggers, quality,
    runtime, coherence) now guard their list-valued fields with
    isinstance(…, list) checks. Pre-fix, a truthy non-list
    ({"edge_cases": 42}, {"triggers": "abc"}) crashed the scorer with
    TypeError from len() or AttributeError from .get() on a string
    character. Post-fix each scorer returns its standard sentinel (score
    −1 / bonus 0) on malformed input; inner assertions and test_case
    items are filtered via _assertion_dicts helpers.
  • score_edges category guard: malformed category entries (ints,
    nulls, lists) no longer crash .startswith() during known-category
    coverage.
  • install.sh and analyze-skill.sh POSIX portability: replaced
    GNU-only \s / \S in grep -E patterns with POSIX character classes
    [[:space:]] / [^[:space:]]. On older macOS (classic BSD grep),
    the installer previously printed "Schliff v unknown" and analyze-skill.sh
    missed name / example detection.
  • Score-to-grade consistency: playground, evolve, GitHub Action, and
    CI badges now share the canonical E-band (35–49) from
    terminal_art.score_to_grade; previously each surface drifted.

Changed

  • E-band grade now emitted in badge/CI output. Consumers that parse
    a grade field with a closed set of {S, A, B, C, D, F} must now accept
    E as well. Breaking for JSON consumers that did exhaustive grade
    switching; non-breaking for score-based consumers.
  • install.sh reads VERSION from pyproject.toml at install time
    instead of carrying a hard-coded literal. Release process simplified
    accordingly in RELEASING.md.
  • EXCLUDED_DIRS centralized in shared.py; doctor and related
    scanners share one canonical list.
  • Scorer signatures cleaned up: unused **kw parameters and dead
    ImportError fallbacks removed from several scoring / pattern modules.
  • verify uses terminal_art.score_to_grade instead of a local
    duplicate, keeping grade mapping in one place.

Added

  • RELEASING.md pre-release checklist documents the full release
    procedure (version bump, CHANGELOG draft, tag, publish, badge cache-bust).
  • Cross-platform CI expansion: GitHub Actions matrix covers Python
    3.9–3.13 on ubuntu-latest and adds a dedicated test-macos job gating
    badge generation / report publishing.
  • ~100 new regression tests covering non-list eval-suite fields, CLI
    error-handling paths, BSD-grep portability of shipped shell scripts,
    prompt-injection sanitization, UTF edge cases, runtime enabled path,
    and score_edges error branches.
  • setuptools upper bound pinned in pyproject.toml to avoid build
    breakage from future major releases; test files excluded from the wheel.

Test coverage

  • Total: 1017 → 1117 (+100) / 0 skipped / 0 failed
  • New files: test_scoring_type_guards.py, test_cli_error_handling.py,
    test_install_version.py; expanded test_scoring_edges_malformed.py
    and new test_evolve_prompt_injection.py / test_evolve_sanitize.py.

v7.1.1

18 Apr 18:07
79cf468

Choose a tag to compare

Patch release. Ships the _RE_ACTIONABLE_LINES pattern fix to PyPI so users installing via pip install schliff receive the corrected scorer.

What's fixed

Bullet-marker support in actionable-line patterns

_RE_ACTIONABLE_LINES and three sibling patterns (_RE_RUN_PATTERN, _RE_DIFF_SIGNAL, _RE_IMPERATIVE_INSTRUCTION) previously only matched numbered list prefixes (1. Run X) or bare imperatives. Markdown bullets — - Run X, * Use Y, + Install Z — fell through silently. A shared _LIST_MARKER alternation now applies to all four.

Regression guards: 10 new test cases in TestListMarkerSupport covering supported markers, bare-imperative regression, nested indentation, word-boundary guards, and marker-without-verb cases. Full suite: 1017 passed (up from 1007).

Real-world impact

Measured against the root CLAUDE.md merged into modelcontextprotocol/servers (PR #3733):

Dimension Before After
efficiency 57 64 (+7)
composite 59.2 61.0 (+1.8)

Context

This release ships the fix that was publicly audited in Scoring My Own MCP Contribution and closed in The blindspot, fixed. The pattern-fix landed in #29; this release is #30.

Install

pip install --upgrade schliff
schliff version
# schliff 7.1.1

Full diff: v7.1.0...v7.1.1

v7.1.0 — Report, Drift, Sync, Track, Web Playground

27 Mar 14:04
60e1f87

Choose a tag to compare

What's New

5 new CLI commands for deeper skill file analysis:

Command Purpose
schliff report <path> Markdown quality report (--gist for shareable link)
schliff drift --repo <dir> Find stale paths, scripts, and make targets
schliff sync <dir> Cross-file consistency: contradictions, gaps, redundancies
schliff track <path> Score history over time with sparkline + regression detection
schliff score --tokens Section-by-section token breakdown with format budgets

Web infrastructure:

  • Interactive Playground — try schliff in the browser before installing
  • Community Leaderboard scaffold with serverless API

Doctor upgrade: now discovers all instruction files (CLAUDE.md, .cursorrules, AGENTS.md) and runs drift analysis on them.

Stats

  • 732 tests (140 new, up from 592)
  • 6 audit iterations across 3 parallel worktree branches
  • Zero dependencies — still Python 3.9+ stdlib only

Security

  • Path traversal prevention in drift detector
  • CSP headers on web properties
  • Control character / bidi override rejection
  • Temp file cleanup, version field bounds

Install / Upgrade

pip install --upgrade schliff

Full Changelog

See CHANGELOG.md for the complete list.

v7.0.0 — Multi-format, security, compare, suggest, --url

26 Mar 19:36

Choose a tag to compare

What's New

Score any agent instruction file — not just SKILL.md. Schliff v7.0 supports CLAUDE.md, .cursorrules, AGENTS.md, and generic markdown out of the box.

New Features

  • Multi-format support — Auto-detection from filename, --format override. Content normalization means zero scorer changes, zero regression risk.
  • Security dimension — 10 regex patterns, 6 categories (injection, exfiltration, obfuscation, dangerous commands, overpermission, missing boundaries). Negation-aware matching, meta-discourse false-positive mitigation. Opt-in: --security
  • schliff compare — Side-by-side quality comparison with dimension deltas
  • schliff suggest — Ranked actionable fixes with estimated score impact
  • schliff score --url — Score remote files from GitHub (HTTPS-only, SSRF protection)
  • Web Playground — Browser-based scorer (playground/)
  • GitHub Action — PR comments with score tables, published to Marketplace

Security Hardening

  • SSRF redirect protection, YAML injection prevention, path traversal guard
  • Shell injection prevention, Content-Length guard, JSONDecodeError handling
  • 60+ agent reviews across 3 feature branches (4 rounds × 6 specialized agents each)

Stats

  • 592 tests (up from 540), all green
  • Self-score: 99.0 [S] (zero regression from v6.3.0)
  • 3 parallel worktrees, 22 commits, ~2,200 LOC added

Install

pip install --upgrade schliff
schliff demo

🤖 Generated with Claude Code

v6.3.0 — schliff diff, scoring bug fixes, Show HN README

26 Mar 13:45
7e2f625

Choose a tag to compare

What's New

schliff diff command

Compare skill scores between git commits. Shows per-dimension deltas with signal/noise analysis.

schliff diff SKILL.md              # vs previous commit
schliff diff SKILL.md --ref main   # vs any ref
schliff diff SKILL.md --json       # machine-readable

Security hardened: ref validation, path containment check, size limit guard.

3 scoring bug fixes

  • triggers: precision/recall reported 100% with zero predictions → now 0%
  • clarity: ambiguous pronoun detection skipped first line → fixed
  • efficiency: score returned float instead of int → consistent with all other dimensions

README overhaul for Show HN

  • Context bridge explaining Claude Code for non-users
  • Commands table split into CLI (standalone) vs Claude Code (require integration)
  • "Where Schliff fits" ecosystem diagram moved to Quick Start
  • Honest self-score framing, anti-gaming benchmark context
  • Test counts with links (540 unit + 99 integration)

DX

  • schliff without args now shows quick-start hints

Tests

  • 85 new tests (18 cmd_diff, 33 composite weights, 34 diff scoring)
  • Total: 540 unit + 99 integration = 639

Full changelog: CHANGELOG.md

Install: pip install schliff==6.3.0

v6.2.0 — Pre-launch hardening

25 Mar 15:38
8e77fa9

Choose a tag to compare

Schliff v6.2.0 — the launch release. Deterministic linter for Claude Code SKILL.md files.

Try it

pip install schliff
schliff demo          # see it in action
schliff doctor        # scan your installed skills

Highlights

  • schliff demo — try it in 5 seconds, no skill files needed
  • schliff badge — generate a score badge for your README
  • Pre-commit hook — automatic quality gate on every commit
  • Anti-gaming — 6/6 gaming patterns detected in benchmark suite
  • 500+ tests, zero dependencies, Python 3.9+ stdlib only

Case Study

@wan-huiyan improved agent-review-panel from 64 [D] to 85.6 [A] — 75% token reduction, A/B validated.

What's New

Added

  • schliff demo command — score a built-in bad skill to see schliff in action instantly
  • schliff badge <path> command — generate copy-paste markdown badge
  • Pre-commit hook support (.pre-commit-hooks.yaml)
  • Doctor --verbose flag with references/ extraction recommendations
  • Community case study

Fixed

  • Security: ReDoS in _RE_ERROR_BEHAVIOR, OOM-safe eval-suite loading, symlink rejection
  • Scoring: clarity auto-injection with custom weights, no_real_examples suppression fix

First time? pip install schliff && schliff demo
Already have skills? schliff doctor scans them all.
Share your results: Show Your Score discussion

v6.1.0 — Launch Repositioning

24 Mar 21:54
562a184

Choose a tag to compare

The deterministic skill linter for Claude Code

Schliff v6.1.0 repositions the project as a measurement tool first — the Ruff for SKILL.md files.

New

  • schliff verify — CI gate with exit codes 0/1/2, --min-score, --regression, history tracking
  • Anti-gaming benchmark — 6/6 gaming attempts detected (benchmarks/anti-gaming/)
  • Repetition detection in efficiency scorer (copy-paste examples: 94 -> 43)
  • Screenshot-ready schliff score output with per-dimension bars and status words

Fixed

  • Structural markers (code fences, headers, rules) excluded from repetition count
  • Code block content excluded from repetition counting
  • verify handles corrupted history and missing files gracefully

Docs

  • README rewritten: competitive positioning, comparison table, real-world results, architecture diagram
  • YAML issue templates replace markdown templates
  • SKILL.md and pyproject.toml metadata updated for discoverability

540+ tests. Zero dependencies. pip install schliff.

v6.0.1 — Pre-Launch Audit

24 Mar 16:19

Choose a tag to compare

Comprehensive pre-launch audit: 40+ bugs fixed, 443 tests, 55 security fixes. See full changelog.

v6.0.0 — Schliff

24 Mar 09:32

Choose a tag to compare

Schliff v6.0.0 — The finishing cut for Claude Code skills

Major release: Rebrand from SkillForge to Schliff (German: "the finishing cut").

New Features

  • Clarity as default dimension — contradictions, vague references, ambiguity detected by default (5% weight)
  • Token cost estimation — Doctor shows per-skill token cost + fleet total
  • GitHub ActionZandereins/schliff@v6 scores skills in CI, comments on PRs, blocks merges
  • pip CLIschliff score SKILL.md works without Claude Code
  • Actionable Doctor — copy-paste commands with full skill paths

Scoring Improvements

  • Trigger confidence cap: eval suites with <8 triggers capped at score 60
  • Context-aware contradiction detection (verb+object+modifier tuples)
  • Anti-gaming: empty headers don't count, efficiency signal caps, trigger threshold floor
  • Missing dimension warnings always shown

Quality

  • 123 unit tests, 99 integration tests, 20 self-tests
  • 40 security fixes (shell injection, prompt injection, ReDoS, supply chain)
  • Self-score: 95.4/100 [S]

Breaking Changes

  • --clarity flag removed (clarity is now always-on; use --no-clarity to opt out)
  • All /skillforge:* commands renamed to /schliff:*

Full Changelog: v5.1.0...v6.0.0

v5.1.0 — Honest Scoring, Beam Search, LSH, Doctor

22 Mar 18:42

Choose a tag to compare

What's New in v5.1

Added

  • Honest Scoring — "Structural Score" label everywhere, replacing misleading "Quality Score"
  • Stemming Tokenizer — suffix-stripping replaces fixed synonym tables for better keyword matching
  • Beam Search — top-3 exploration instead of greedy top-1 from iteration 4 onward
  • EMA Plateau Detection — Exponential Moving Average replaces fixed-window ROI stopping
  • MinHash + LSH — O(n) mesh analysis instead of O(n²) for 50+ skills
  • Context-aware Patches — generates meaningful descriptions instead of TODOs
  • Doctor Command (doctor.py) — scans all installed skills, shows health summary with grades
  • Dimension Guard — prevents patches that tank a single dimension by >15 points
  • Coherence Check — instruction-assertion alignment as quality bonus
  • 40+ Pre-compiled Regex — performance optimization across the scorer
  • Public Cache APIinvalidate_cache() replaces direct _file_cache.pop()
  • Underscore Alias Modules for Python import compatibility
  • 21 Runtime Assertions — response_excludes assertions across 4 test cases for validated scoring

Fixed

  • State truncation bug in auto-improve loop
  • EMA indexing off-by-one in plateau detection
  • Deterministic hash for MinHash reproducibility
  • 2 trigger false positives eliminated (41/41 = 100%)

Stats

Metric Value
Structural Score 99.9/100 [S]
Runtime Score 100/100 [S] (with --runtime)
Composite (all 7 dims) 99.9/100 [S]
Tests 99/99 passing (87 integration + 12 self-tests)
Runtime Assertions 21/21 passing
Security 27 fixes from 15-agent deep audit, 0 regressions

Dimension Breakdown

structure       ██████████  100/100
triggers        ██████████  100/100
quality         ██████████  100/100
edges           ██████████  100/100
efficiency      █████████░   93/100
composability   ██████████  100/100
runtime         ██████████  100/100

Quick Start

git clone https://github.com/Zandereins/skillforge.git
cp -r skillforge/skills/skillforge ~/.claude/skills/
cp -r skillforge/commands/skillforge ~/.claude/commands/

Full docs: README