Releases · Zandereins/schliff

24 Apr 18:30

Zandereins

v7.2.0

d63161b

v7.2.0 — security hardening + scoring robustness Latest

Latest

[7.2.0] - 2026-04-24

Security

Prompt-injection hardening in schliff evolve: user-authored skill
content is wrapped in explicit XML tags with a per-call random nonce
before being passed to LLM prompts. Earlier versions fed raw content
into the meta-prompt, letting a crafted SKILL.md inject directives.
A sanitizer rejects XML-tag injection attempts and an explicit
<user_content>…</user_content> boundary isolates user input.
CLI error-handling with no traceback leaks: schliff score on a
directory or oversized file no longer leaks a raw Python traceback.
read_skill_safe rejects directories explicitly with a clear
ValueError; cli.main() wraps handler dispatch in one
(OSError, ValueError) try/except that renders a short Error: …
line on stderr and exits 1.

Fixed

Scoring robustness across all dimensions: all five scorers that
consume user-authored eval-suite JSON (edges, triggers, quality,
runtime, coherence) now guard their list-valued fields with
isinstance(…, list) checks. Pre-fix, a truthy non-list
({"edge_cases": 42}, {"triggers": "abc"}) crashed the scorer with
TypeError from len() or AttributeError from .get() on a string
character. Post-fix each scorer returns its standard sentinel (score
−1 / bonus 0) on malformed input; inner assertions and test_case
items are filtered via _assertion_dicts helpers.
score_edges category guard: malformed category entries (ints,
nulls, lists) no longer crash .startswith() during known-category
coverage.
install.sh and analyze-skill.sh POSIX portability: replaced
GNU-only \s / \S in grep -E patterns with POSIX character classes
[[:space:]] / [^[:space:]]. On older macOS (classic BSD grep),
the installer previously printed "Schliff v unknown" and analyze-skill.sh
missed name / example detection.
Score-to-grade consistency: playground, evolve, GitHub Action, and
CI badges now share the canonical E-band (35–49) from
terminal_art.score_to_grade; previously each surface drifted.

Changed

E-band grade now emitted in badge/CI output. Consumers that parse
a grade field with a closed set of {S, A, B, C, D, F} must now accept
E as well. Breaking for JSON consumers that did exhaustive grade
switching; non-breaking for score-based consumers.
install.sh reads VERSION from pyproject.toml at install time
instead of carrying a hard-coded literal. Release process simplified
accordingly in RELEASING.md.
EXCLUDED_DIRS centralized in shared.py; doctor and related
scanners share one canonical list.
Scorer signatures cleaned up: unused **kw parameters and dead
ImportError fallbacks removed from several scoring / pattern modules.
verify uses terminal_art.score_to_grade instead of a local
duplicate, keeping grade mapping in one place.

Added

RELEASING.md pre-release checklist documents the full release
procedure (version bump, CHANGELOG draft, tag, publish, badge cache-bust).
Cross-platform CI expansion: GitHub Actions matrix covers Python
3.9–3.13 on ubuntu-latest and adds a dedicated test-macos job gating
badge generation / report publishing.
~100 new regression tests covering non-list eval-suite fields, CLI
error-handling paths, BSD-grep portability of shipped shell scripts,
prompt-injection sanitization, UTF edge cases, runtime enabled path,
and score_edges error branches.
setuptools upper bound pinned in pyproject.toml to avoid build
breakage from future major releases; test files excluded from the wheel.

Test coverage

Total: 1017 → 1117 (+100) / 0 skipped / 0 failed
New files: test_scoring_type_guards.py, test_cli_error_handling.py,
test_install_version.py; expanded test_scoring_edges_malformed.py
and new test_evolve_prompt_injection.py / test_evolve_sanitize.py.

Assets 2

18 Apr 18:07

Zandereins

v7.1.1

79cf468

v7.1.1

Patch release. Ships the _RE_ACTIONABLE_LINES pattern fix to PyPI so users installing via pip install schliff receive the corrected scorer.

What's fixed

Bullet-marker support in actionable-line patterns

_RE_ACTIONABLE_LINES and three sibling patterns (_RE_RUN_PATTERN, _RE_DIFF_SIGNAL, _RE_IMPERATIVE_INSTRUCTION) previously only matched numbered list prefixes (1. Run X) or bare imperatives. Markdown bullets — - Run X, * Use Y, + Install Z — fell through silently. A shared _LIST_MARKER alternation now applies to all four.

Regression guards: 10 new test cases in TestListMarkerSupport covering supported markers, bare-imperative regression, nested indentation, word-boundary guards, and marker-without-verb cases. Full suite: 1017 passed (up from 1007).

Real-world impact

Measured against the root CLAUDE.md merged into modelcontextprotocol/servers (PR #3733):

Dimension	Before	After
efficiency	57	64 (+7)
composite	59.2	61.0 (+1.8)

Context

This release ships the fix that was publicly audited in Scoring My Own MCP Contribution and closed in The blindspot, fixed. The pattern-fix landed in #29; this release is #30.

Install

pip install --upgrade schliff
schliff version
# schliff 7.1.1

Full diff: v7.1.0...v7.1.1

Assets 2

27 Mar 14:04

Zandereins

v7.1.0

60e1f87

v7.1.0 — Report, Drift, Sync, Track, Web Playground

What's New

5 new CLI commands for deeper skill file analysis:

Command	Purpose
`schliff report <path>`	Markdown quality report (`--gist` for shareable link)
`schliff drift --repo <dir>`	Find stale paths, scripts, and make targets
`schliff sync <dir>`	Cross-file consistency: contradictions, gaps, redundancies
`schliff track <path>`	Score history over time with sparkline + regression detection
`schliff score --tokens`	Section-by-section token breakdown with format budgets

Web infrastructure:

Interactive Playground — try schliff in the browser before installing
Community Leaderboard scaffold with serverless API

Doctor upgrade: now discovers all instruction files (CLAUDE.md, .cursorrules, AGENTS.md) and runs drift analysis on them.

Stats

732 tests (140 new, up from 592)
6 audit iterations across 3 parallel worktree branches
Zero dependencies — still Python 3.9+ stdlib only

Security

Path traversal prevention in drift detector
CSP headers on web properties
Control character / bidi override rejection
Temp file cleanup, version field bounds

Install / Upgrade

pip install --upgrade schliff

Full Changelog

See CHANGELOG.md for the complete list.

Assets 4

26 Mar 19:36

Zandereins

v7.0.0

3e6f138

v7.0.0 — Multi-format, security, compare, suggest, --url

What's New

Score any agent instruction file — not just SKILL.md. Schliff v7.0 supports CLAUDE.md, .cursorrules, AGENTS.md, and generic markdown out of the box.

New Features

Multi-format support — Auto-detection from filename, --format override. Content normalization means zero scorer changes, zero regression risk.
Security dimension — 10 regex patterns, 6 categories (injection, exfiltration, obfuscation, dangerous commands, overpermission, missing boundaries). Negation-aware matching, meta-discourse false-positive mitigation. Opt-in: --security
schliff compare — Side-by-side quality comparison with dimension deltas
schliff suggest — Ranked actionable fixes with estimated score impact
schliff score --url — Score remote files from GitHub (HTTPS-only, SSRF protection)
Web Playground — Browser-based scorer (playground/)
GitHub Action — PR comments with score tables, published to Marketplace

Security Hardening

SSRF redirect protection, YAML injection prevention, path traversal guard
Shell injection prevention, Content-Length guard, JSONDecodeError handling
60+ agent reviews across 3 feature branches (4 rounds × 6 specialized agents each)

Stats

592 tests (up from 540), all green
Self-score: 99.0 [S] (zero regression from v6.3.0)
3 parallel worktrees, 22 commits, ~2,200 LOC added

Install

pip install --upgrade schliff
schliff demo

🤖 Generated with Claude Code

Assets 2

26 Mar 13:45

Zandereins

v6.3.0

7e2f625

v6.3.0 — schliff diff, scoring bug fixes, Show HN README

What's New

`schliff diff` command

Compare skill scores between git commits. Shows per-dimension deltas with signal/noise analysis.

schliff diff SKILL.md              # vs previous commit
schliff diff SKILL.md --ref main   # vs any ref
schliff diff SKILL.md --json       # machine-readable

Security hardened: ref validation, path containment check, size limit guard.

3 scoring bug fixes

triggers: precision/recall reported 100% with zero predictions → now 0%
clarity: ambiguous pronoun detection skipped first line → fixed
efficiency: score returned float instead of int → consistent with all other dimensions

README overhaul for Show HN

Context bridge explaining Claude Code for non-users
Commands table split into CLI (standalone) vs Claude Code (require integration)
"Where Schliff fits" ecosystem diagram moved to Quick Start
Honest self-score framing, anti-gaming benchmark context
Test counts with links (540 unit + 99 integration)

DX

schliff without args now shows quick-start hints

Tests

85 new tests (18 cmd_diff, 33 composite weights, 34 diff scoring)
Total: 540 unit + 99 integration = 639

Full changelog: CHANGELOG.md

Install: pip install schliff==6.3.0

Assets 2

25 Mar 15:38

Zandereins

v6.2.0

8e77fa9

v6.2.0 — Pre-launch hardening

Schliff v6.2.0 — the launch release. Deterministic linter for Claude Code SKILL.md files.

Try it

pip install schliff
schliff demo          # see it in action
schliff doctor        # scan your installed skills

Highlights

schliff demo — try it in 5 seconds, no skill files needed
schliff badge — generate a score badge for your README
Pre-commit hook — automatic quality gate on every commit
Anti-gaming — 6/6 gaming patterns detected in benchmark suite
500+ tests, zero dependencies, Python 3.9+ stdlib only

Case Study

@wan-huiyan improved agent-review-panel from 64 [D] to 85.6 [A] — 75% token reduction, A/B validated.

What's New

Added

schliff demo command — score a built-in bad skill to see schliff in action instantly
schliff badge <path> command — generate copy-paste markdown badge
Pre-commit hook support (.pre-commit-hooks.yaml)
Doctor --verbose flag with references/ extraction recommendations
Community case study

Fixed

Security: ReDoS in _RE_ERROR_BEHAVIOR, OOM-safe eval-suite loading, symlink rejection
Scoring: clarity auto-injection with custom weights, no_real_examples suppression fix

First time? pip install schliff && schliff demo
Already have skills? schliff doctor scans them all.
Share your results: Show Your Score discussion

Assets 2

24 Mar 21:54

Zandereins

v6.1.0

562a184

v6.1.0 — Launch Repositioning

The deterministic skill linter for Claude Code

Schliff v6.1.0 repositions the project as a measurement tool first — the Ruff for SKILL.md files.

New

schliff verify — CI gate with exit codes 0/1/2, --min-score, --regression, history tracking
Anti-gaming benchmark — 6/6 gaming attempts detected (benchmarks/anti-gaming/)
Repetition detection in efficiency scorer (copy-paste examples: 94 -> 43)
Screenshot-ready schliff score output with per-dimension bars and status words

Fixed

Structural markers (code fences, headers, rules) excluded from repetition count
Code block content excluded from repetition counting
verify handles corrupted history and missing files gracefully

Docs

README rewritten: competitive positioning, comparison table, real-world results, architecture diagram
YAML issue templates replace markdown templates
SKILL.md and pyproject.toml metadata updated for discoverability

540+ tests. Zero dependencies. pip install schliff.

Assets 2

24 Mar 16:19

Zandereins

v6.0.1

e1179ad

v6.0.1 — Pre-Launch Audit

Comprehensive pre-launch audit: 40+ bugs fixed, 443 tests, 55 security fixes. See full changelog.

Assets 2

24 Mar 09:32

Zandereins

v6.0.0

764ed4f

v6.0.0 — Schliff

Schliff v6.0.0 — The finishing cut for Claude Code skills

Major release: Rebrand from SkillForge to Schliff (German: "the finishing cut").

New Features

Clarity as default dimension — contradictions, vague references, ambiguity detected by default (5% weight)
Token cost estimation — Doctor shows per-skill token cost + fleet total
GitHub Action — Zandereins/schliff@v6 scores skills in CI, comments on PRs, blocks merges
pip CLI — schliff score SKILL.md works without Claude Code
Actionable Doctor — copy-paste commands with full skill paths

Scoring Improvements

Trigger confidence cap: eval suites with <8 triggers capped at score 60
Context-aware contradiction detection (verb+object+modifier tuples)
Anti-gaming: empty headers don't count, efficiency signal caps, trigger threshold floor
Missing dimension warnings always shown

Quality

123 unit tests, 99 integration tests, 20 self-tests
40 security fixes (shell injection, prompt injection, ReDoS, supply chain)
Self-score: 95.4/100 [S]

Breaking Changes

--clarity flag removed (clarity is now always-on; use --no-clarity to opt out)
All /skillforge:* commands renamed to /schliff:*

Full Changelog: v5.1.0...v6.0.0

Assets 2

22 Mar 18:42

Zandereins

v5.1.0

14d820c

v5.1.0 — Honest Scoring, Beam Search, LSH, Doctor

What's New in v5.1

Added

Honest Scoring — "Structural Score" label everywhere, replacing misleading "Quality Score"
Stemming Tokenizer — suffix-stripping replaces fixed synonym tables for better keyword matching
Beam Search — top-3 exploration instead of greedy top-1 from iteration 4 onward
EMA Plateau Detection — Exponential Moving Average replaces fixed-window ROI stopping
MinHash + LSH — O(n) mesh analysis instead of O(n²) for 50+ skills
Context-aware Patches — generates meaningful descriptions instead of TODOs
Doctor Command (doctor.py) — scans all installed skills, shows health summary with grades
Dimension Guard — prevents patches that tank a single dimension by >15 points
Coherence Check — instruction-assertion alignment as quality bonus
40+ Pre-compiled Regex — performance optimization across the scorer
Public Cache API — invalidate_cache() replaces direct _file_cache.pop()
Underscore Alias Modules for Python import compatibility
21 Runtime Assertions — response_excludes assertions across 4 test cases for validated scoring

Fixed

State truncation bug in auto-improve loop
EMA indexing off-by-one in plateau detection
Deterministic hash for MinHash reproducibility
2 trigger false positives eliminated (41/41 = 100%)

Stats

Metric	Value
Structural Score	99.9/100 [S]
Runtime Score	100/100 [S] (with `--runtime`)
Composite (all 7 dims)	99.9/100 [S]
Tests	99/99 passing (87 integration + 12 self-tests)
Runtime Assertions	21/21 passing
Security	27 fixes from 15-agent deep audit, 0 regressions

Dimension Breakdown

structure       ██████████  100/100
triggers        ██████████  100/100
quality         ██████████  100/100
edges           ██████████  100/100
efficiency      █████████░   93/100
composability   ██████████  100/100
runtime         ██████████  100/100

Quick Start

git clone https://github.com/Zandereins/skillforge.git
cp -r skillforge/skills/skillforge ~/.claude/skills/
cp -r skillforge/commands/skillforge ~/.claude/commands/

Full docs: README

Assets 2

Releases: Zandereins/schliff

v7.2.0 — security hardening + scoring robustness

[7.2.0] - 2026-04-24

Security

Fixed

Changed

Added

Test coverage

Uh oh!

v7.1.1

What's fixed

Real-world impact

Context

Install

Uh oh!

v7.1.0 — Report, Drift, Sync, Track, Web Playground

What's New

Stats

Security

Install / Upgrade

Full Changelog

Uh oh!

v7.0.0 — Multi-format, security, compare, suggest, --url

What's New

New Features

Security Hardening

Stats

Install

Uh oh!

v6.3.0 — schliff diff, scoring bug fixes, Show HN README

What's New

schliff diff command

3 scoring bug fixes

README overhaul for Show HN

DX

Tests

Uh oh!

v6.2.0 — Pre-launch hardening

Try it

Highlights

Case Study

What's New

Added

Fixed

Uh oh!

v6.1.0 — Launch Repositioning

The deterministic skill linter for Claude Code

New

Fixed

Docs

Uh oh!

v6.0.1 — Pre-Launch Audit

Uh oh!

v6.0.0 — Schliff

Schliff v6.0.0 — The finishing cut for Claude Code skills

New Features

Scoring Improvements

Quality

Breaking Changes

Uh oh!

v5.1.0 — Honest Scoring, Beam Search, LSH, Doctor

What's New in v5.1

Added

Fixed

Stats

Dimension Breakdown

Quick Start

Uh oh!

`schliff diff` command