-
Notifications
You must be signed in to change notification settings - Fork 95
Meta: Migrate from Git Hash Versioning to Semantic Versioning #62
Description
Meta-Issue: Migrate from Git Hash Versioning to Semantic Versioning
Summary
PinchBench uses git commit hashes (short SHA) to identify benchmark versions. This creates a confusing user experience—users see opaque strings like a1b2c3d instead of meaningful version numbers like 1.0.0.
This issue tracks the migration to semantic versioning (SemVer) across all PinchBench components.
Part 1: Current State Research
Where the Git Hash Originates
skill repo — scripts/benchmark.py (lines 275-288):
The benchmark version is generated at runtime by reading the current git HEAD:
def _get_git_version(script_dir: Path) -> str:
result = subprocess.run(
["git", "rev-parse", "--short", "HEAD"],
capture_output=True,
text=True,
timeout=2,
cwd=script_dir,
)
return result.stdout.strip() # e.g., "a1b2c3d"This is called at benchmark completion (line 718) and included in the results JSON:
"benchmark_version": _get_git_version(skill_root),How the Git Hash Flows Through the System
1. Submission Upload (skill → api)
skill repo — scripts/lib_upload.py (lines 257-258):
"benchmark_version": raw.get("benchmark_version"),The benchmark_version from the results JSON is passed through to the API submission payload unchanged.
2. API Storage (api repo)
api repo — schema.sql (lines 36-50):
CREATE TABLE IF NOT EXISTS submissions (
...
benchmark_version TEXT,
...
);
CREATE TABLE IF NOT EXISTS benchmark_versions (
id TEXT PRIMARY KEY, -- The git hash
created_at TEXT NOT NULL DEFAULT (datetime('now')),
current INTEGER NOT NULL DEFAULT 0,
hidden INTEGER NOT NULL DEFAULT 0
);api repo — src/routes/results.ts (lines 238-242):
On first submission with a new version, auto-inserts into benchmark_versions:
if (payload.benchmark_version) {
await c.env.prod_pinchbench
.prepare("INSERT OR IGNORE INTO benchmark_versions (id, current) VALUES (?, 0)")
.bind(payload.benchmark_version)
.run();
}3. Version API Endpoint (api repo)
api repo — src/routes/benchmarkVersions.ts:
GET /api/benchmark_versions— Returns all non-hidden versions ordered by created_at DESCGET /api/benchmark_versions/latest— Returns the current version (wherecurrent = 1)
Response shape:
{
"versions": [
{ "id": "a1b2c3d", "created_at": "...", "is_current": true, "submission_count": 42 }
]
}4. Frontend Display (leaderboard repo)
leaderboard repo — lib/types.ts (lines 220-226):
export interface BenchmarkVersion {
id: string; // The git hash
created_at: string;
is_current: boolean;
submission_count: number;
}leaderboard repo — components/version-selector.tsx:
- Displays
v.id.slice(0, 8)(first 8 chars of git hash) - GitCommit icon reinforces it's a git hash
- Shows "Current" badge for versions where
is_current === true
leaderboard repo — Other display locations:
app/submission/[id]/page.tsx(line 180): Shows full benchmark_versionapp/runs/page.tsx(line 121): Shows benchmark_version in runs tableapp/about/page.tsx(lines 237-284): Documents the git hash versioning scheme
Files That Reference benchmark_version
| Repo | File | Usage |
|---|---|---|
| skill | scripts/benchmark.py |
Generates the hash via _get_git_version() |
| skill | scripts/lib_upload.py |
Passes hash in submission payload |
| api | schema.sql |
Defines submissions.benchmark_version TEXT field |
| api | schema.sql |
Defines benchmark_versions.id TEXT PRIMARY KEY |
| api | src/routes/results.ts |
Stores hash, auto-creates version record |
| api | src/routes/benchmarkVersions.ts |
Returns versions list |
| api | src/routes/leaderboard.ts |
Filters by version |
| api | src/routes/submissions.ts |
Filters by version |
| api | src/routes/users.ts |
Filters user submissions by version |
| api | src/routes/providers.ts |
Uses version in queries |
| api | src/routes/admin.ts |
Admin version management |
| api | src/utils/query.ts |
Builds version filter queries |
| leaderboard | lib/types.ts |
Defines BenchmarkVersion interface |
| leaderboard | lib/api.ts |
Passes version query param |
| leaderboard | components/version-selector.tsx |
Displays version picker |
| leaderboard | app/page.tsx |
Passes version to leaderboard |
| leaderboard | app/runs/page.tsx |
Displays version in runs |
| leaderboard | app/submission/[id]/page.tsx |
Shows submission version |
| leaderboard | app/about/page.tsx |
Documents versioning scheme |
Part 2: Implementation Plan
Goal
Replace opaque git hashes with semantic versions (e.g., 1.0.0, 1.1.0, 2.0.0) while maintaining backward compatibility with existing submissions. GitHub releases are the single source of truth — versions are managed through GitHub releases of the github.com/pinchbench/skill repository.
How the Benchmark Knows Its Version (skill repo)
Approach: Hybrid with setuptools-scm + BENCHMARK_VERSION Fallback
This handles all user scenarios:
| Scenario | Version Source |
|---|---|
| Installed via pip | importlib.metadata.version() (from git tag via setuptools-scm) |
| Cloned and run directly | BENCHMARK_VERSION file |
| Downloaded zip (no git) | BENCHMARK_VERSION file |
| Dev environment (untagged) | Git hash fallback |
pyproject.toml Changes
[build-system]
requires = ["setuptools>=61", "setuptools-scm>=8"]
build-backend = "setuptools.build_meta"
[project]
name = "pinchbench-skill"
dynamic = ["version"] # Version comes from setuptools-scm
description = "PinchBench - OpenClaw Agent Benchmarking System"
readme = "SKILL.md"
requires-python = ">=3.10"
dependencies = [
"pyyaml>=6.0.1",
"fabric>=3.2.2",
"paramiko>=3.0.0",
]
[tool.setuptools_scm]
# Extracts version from git tags like v1.0.0
# Falls back to 0.0.0+hash for untagged commitsNew File: BENCHMARK_VERSION
Plain text file in repo root containing just the version string:
1.0.0
This file is auto-updated by GitHub Actions on each release.
New Function: _get_benchmark_version()
Replace _get_git_version() in scripts/benchmark.py:
def _get_benchmark_version(script_dir: Path) -> str:
"""Get benchmark version with multiple fallback strategies.
Resolution order:
1. importlib.metadata (if package is installed)
2. BENCHMARK_VERSION file (for cloned/downloaded usage)
3. Git tag via `git describe`
4. Git short hash (ultimate fallback)
"""
# 1. Try importlib.metadata (works if package is installed via pip)
try:
from importlib.metadata import version
return version("pinchbench-skill")
except Exception:
pass
# 2. Try BENCHMARK_VERSION file
version_file = script_dir / "BENCHMARK_VERSION"
if version_file.exists():
v = version_file.read_text().strip()
if v:
return v
# 3. Try git tag (e.g., "v1.0.0" -> "1.0.0")
try:
result = subprocess.run(
["git", "describe", "--tags", "--always"],
capture_output=True,
text=True,
timeout=2,
check=False,
cwd=script_dir,
)
if result.returncode == 0:
tag = result.stdout.strip()
# Strip "v" prefix if present
if tag.startswith("v"):
return tag[1:]
return tag
except (subprocess.SubprocessError, FileNotFoundError, OSError):
pass
# 4. Ultimate fallback: git short hash
return _get_git_version(script_dir)GitHub Actions Workflow: .github/workflows/release.yml
Auto-updates BENCHMARK_VERSION file when a GitHub release is published:
name: Update Version on Release
on:
release:
types: [published]
jobs:
update-version-file:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4
with:
ref: main
- name: Update BENCHMARK_VERSION file
run: |
# Extract version from tag (strip "v" prefix)
VERSION="${GITHUB_REF_NAME#v}"
echo "$VERSION" > BENCHMARK_VERSION
echo "Updated BENCHMARK_VERSION to $VERSION"
- name: Commit and push
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add BENCHMARK_VERSION
git commit -m "chore: update BENCHMARK_VERSION to ${GITHUB_REF_NAME}"
git pushData Model Changes
benchmark_versions Table (api repo)
Current schema:
CREATE TABLE benchmark_versions (
id TEXT PRIMARY KEY, -- git hash (e.g., "a1b2c3d")
created_at TEXT,
current INTEGER,
hidden INTEGER
);New schema:
CREATE TABLE benchmark_versions (
id TEXT PRIMARY KEY, -- Can be git hash OR semver
semver TEXT, -- Normalized semver: "1.0.0", "1.1.0"
label TEXT, -- Display label: "1.0.0" or "1.0.0-beta.5"
release_notes TEXT, -- Markdown changelog (from GitHub release body)
release_url TEXT, -- Link to GitHub release
created_at TEXT NOT NULL DEFAULT (datetime('now')),
current INTEGER NOT NULL DEFAULT 0,
hidden INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_benchmark_versions_semver ON benchmark_versions(semver);Data migration for existing versions:
All existing git-hash versions will be migrated to 1.0.0-beta.N semver format, where N is an incrementing number based on created_at order. Example:
| id (hash) | semver | label |
|---|---|---|
ad1c230 |
1.0.0-beta.1 |
1.0.0-beta.1 |
7df28f6 |
1.0.0-beta.2 |
1.0.0-beta.2 |
| ... | ... | ... |
1e2ba6b |
1.0.0-beta.17 |
1.0.0-beta.17 |
First true semver release becomes 1.0.0.
Migration script (to run manually on production):
-- migrations/YYYYMMDD_add_semver_columns.sql
-- Step 1: Add new columns
ALTER TABLE benchmark_versions ADD COLUMN semver TEXT;
ALTER TABLE benchmark_versions ADD COLUMN label TEXT;
ALTER TABLE benchmark_versions ADD COLUMN release_notes TEXT;
ALTER TABLE benchmark_versions ADD COLUMN release_url TEXT;
-- Step 2: Create index
CREATE INDEX IF NOT EXISTS idx_benchmark_versions_semver ON benchmark_versions(semver);-- migrations/YYYYMMDD_backfill_legacy_versions.sql
-- Run this AFTER the column migration
-- This needs to be generated dynamically based on existing versions
-- The implementation should query existing versions ordered by created_at
-- and generate UPDATE statements like:
--
-- UPDATE benchmark_versions SET semver = '1.0.0-beta.1', label = '1.0.0-beta.1' WHERE id = 'ad1c230';
-- UPDATE benchmark_versions SET semver = '1.0.0-beta.2', label = '1.0.0-beta.2' WHERE id = '7df28f6';
-- ... etc
⚠️ Note: The implementation must produce SQL migration files that can be run manually against the production D1 database.
BenchmarkVersion API Response (api repo)
Current:
{ "id": "a1b2c3d", "created_at": "...", "is_current": true, "submission_count": 42 }New:
{
"id": "1.0.0",
"semver": "1.0.0",
"label": "1.0.0",
"release_notes": "## What's Changed\n- Initial stable release...",
"release_url": "https://github.com/pinchbench/skill/releases/tag/v1.0.0",
"created_at": "...",
"is_current": true,
"submission_count": 42
}For legacy versions:
{
"id": "a1b2c3d",
"semver": "1.0.0-beta.5",
"label": "1.0.0-beta.5",
"release_notes": null,
"release_url": null,
"created_at": "...",
"is_current": false,
"submission_count": 15
}BenchmarkVersion TypeScript Type (leaderboard repo)
export interface BenchmarkVersion {
id: string;
semver: string | null; // null only if migration hasn't run yet
label: string; // display string
release_notes: string | null;
release_url: string | null; // link to GitHub release
created_at: string;
is_current: boolean;
submission_count: number;
}Changes by Repository
skill repo
| File | Change |
|---|---|
pyproject.toml |
Add setuptools-scm, set dynamic = ["version"] |
BENCHMARK_VERSION |
NEW FILE: Plain text version file (e.g., 1.0.0) |
.github/workflows/release.yml |
NEW FILE: Auto-update BENCHMARK_VERSION on release |
scripts/benchmark.py |
Replace _get_git_version() with _get_benchmark_version() using multi-strategy resolution |
api repo
| File | Change |
|---|---|
schema.sql |
Add semver, label, release_notes, release_url columns |
migrations/YYYYMMDD_add_semver_columns.sql |
NEW: Migration to add columns |
migrations/YYYYMMDD_backfill_legacy_versions.sql |
NEW: Migration to backfill existing versions |
src/routes/benchmarkVersions.ts |
Return new fields, sort by semver when available |
src/routes/results.ts |
Accept any version format (no enforcement—old clients may still submit hashes) |
src/types.ts |
Update BenchmarkVersion type |
No semver enforcement: Since old skill clients will continue to submit git hashes, the API must accept any string as benchmark_version. Semver format is encouraged but not required.
leaderboard repo
| File | Change |
|---|---|
lib/types.ts |
Add semver, label, release_notes, release_url to BenchmarkVersion |
components/version-selector.tsx |
Display label instead of id.slice(0,8), remove GitCommit icon for semver versions |
app/about/page.tsx |
Update versioning documentation |
UI display logic:
const displayVersion = version.label ?? version.id.slice(0, 8);
const isSemver = version.semver !== null && !version.semver.includes('beta');Sorting Strategy
Versions will be sorted with semver versions first (sorted by semver rules), then git-hash versions by date:
- Semver versions: Sorted by semver comparison (2.0.0 > 1.1.0 > 1.0.0 > 1.0.0-beta.17 > 1.0.0-beta.1)
- Git-hash versions without semver: Sorted by
created_atDESC
Since old clients may continue submitting git hashes indefinitely, the system must gracefully handle mixed version types.
Backward Compatibility
| Scenario | Handling |
|---|---|
| Old skill submits git hash | API accepts, stores as before, auto-creates version with null semver |
| New skill submits semver | API stores with id = semver |
| Frontend receives null semver | Falls back to id.slice(0,8) display |
| API filters by version | Works with both hash and semver IDs |
| Existing submissions | Unchanged, still link to their git-hash version |
Release Process (post-migration)
- Maintainer creates GitHub release with tag
v1.1.0and release notes - GitHub Actions automatically updates
BENCHMARK_VERSIONfile to1.1.0 - Users who clone get the new
BENCHMARK_VERSIONfile - Users who pip install get version from package metadata (via setuptools-scm)
- First submission with new version auto-creates
benchmark_versionsrecord - Admin sets
current = 1for new version when ready to make it default - Admin optionally adds
release_notesandrelease_url(or automate via webhook later)
Decisions Made
-
Version source of truth: GitHub releases (git tags). The
BENCHMARK_VERSIONfile is auto-updated by CI. -
Legacy labeling: Existing git-hash versions get
1.0.0-beta.Nsemver labels (not "Legacy" bucket). This keeps everything in semver format for consistent sorting. -
Semver enforcement: No enforcement. Old clients will continue submitting git hashes, so API accepts any string. New skill versions encouraged to use semver via
BENCHMARK_VERSIONfile. -
Release metadata:
release_notesfield for markdown changelog +release_urlfield for link to GitHub release. -
Sorting: Semver-aware sorting. Proper semver versions first (sorted by semver rules), then git-hash versions by date.
-
Version resolution: Hybrid approach with setuptools-scm for installed packages + BENCHMARK_VERSION file fallback for direct usage.
Acceptance Criteria
-
pyproject.tomlupdated with setuptools-scm for dynamic versioning -
BENCHMARK_VERSIONfile created and populated - GitHub Actions workflow auto-updates
BENCHMARK_VERSIONon release -
_get_benchmark_version()function implements multi-strategy resolution - Version dropdown shows
1.0.0,1.1.0, etc. for new versions - Legacy versions display as
1.0.0-beta.N - API returns
semver,label,release_notes,release_urlfields - Database migration scripts produced for manual production run
- Sorting handles mixed semver/hash versions correctly
- Documentation updated in about page
- Release process documented for maintainers