Skip to content

Meta: Migrate from Git Hash Versioning to Semantic Versioning #62

@ScuttleBot

Description

@ScuttleBot

Meta-Issue: Migrate from Git Hash Versioning to Semantic Versioning

Summary

PinchBench uses git commit hashes (short SHA) to identify benchmark versions. This creates a confusing user experience—users see opaque strings like a1b2c3d instead of meaningful version numbers like 1.0.0.

This issue tracks the migration to semantic versioning (SemVer) across all PinchBench components.


Part 1: Current State Research

Where the Git Hash Originates

skill reposcripts/benchmark.py (lines 275-288):

The benchmark version is generated at runtime by reading the current git HEAD:

def _get_git_version(script_dir: Path) -> str:
    result = subprocess.run(
        ["git", "rev-parse", "--short", "HEAD"],
        capture_output=True,
        text=True,
        timeout=2,
        cwd=script_dir,
    )
    return result.stdout.strip()  # e.g., "a1b2c3d"

This is called at benchmark completion (line 718) and included in the results JSON:

"benchmark_version": _get_git_version(skill_root),

How the Git Hash Flows Through the System

1. Submission Upload (skill → api)

skill reposcripts/lib_upload.py (lines 257-258):

"benchmark_version": raw.get("benchmark_version"),

The benchmark_version from the results JSON is passed through to the API submission payload unchanged.

2. API Storage (api repo)

api reposchema.sql (lines 36-50):

CREATE TABLE IF NOT EXISTS submissions (
  ...
  benchmark_version TEXT,
  ...
);

CREATE TABLE IF NOT EXISTS benchmark_versions (
  id TEXT PRIMARY KEY,           -- The git hash
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  current INTEGER NOT NULL DEFAULT 0,
  hidden INTEGER NOT NULL DEFAULT 0
);

api reposrc/routes/results.ts (lines 238-242):

On first submission with a new version, auto-inserts into benchmark_versions:

if (payload.benchmark_version) {
  await c.env.prod_pinchbench
    .prepare("INSERT OR IGNORE INTO benchmark_versions (id, current) VALUES (?, 0)")
    .bind(payload.benchmark_version)
    .run();
}

3. Version API Endpoint (api repo)

api reposrc/routes/benchmarkVersions.ts:

  • GET /api/benchmark_versions — Returns all non-hidden versions ordered by created_at DESC
  • GET /api/benchmark_versions/latest — Returns the current version (where current = 1)

Response shape:

{
  "versions": [
    { "id": "a1b2c3d", "created_at": "...", "is_current": true, "submission_count": 42 }
  ]
}

4. Frontend Display (leaderboard repo)

leaderboard repolib/types.ts (lines 220-226):

export interface BenchmarkVersion {
  id: string;           // The git hash
  created_at: string;
  is_current: boolean;
  submission_count: number;
}

leaderboard repocomponents/version-selector.tsx:

  • Displays v.id.slice(0, 8) (first 8 chars of git hash)
  • GitCommit icon reinforces it's a git hash
  • Shows "Current" badge for versions where is_current === true

leaderboard repo — Other display locations:

  • app/submission/[id]/page.tsx (line 180): Shows full benchmark_version
  • app/runs/page.tsx (line 121): Shows benchmark_version in runs table
  • app/about/page.tsx (lines 237-284): Documents the git hash versioning scheme

Files That Reference benchmark_version

Repo File Usage
skill scripts/benchmark.py Generates the hash via _get_git_version()
skill scripts/lib_upload.py Passes hash in submission payload
api schema.sql Defines submissions.benchmark_version TEXT field
api schema.sql Defines benchmark_versions.id TEXT PRIMARY KEY
api src/routes/results.ts Stores hash, auto-creates version record
api src/routes/benchmarkVersions.ts Returns versions list
api src/routes/leaderboard.ts Filters by version
api src/routes/submissions.ts Filters by version
api src/routes/users.ts Filters user submissions by version
api src/routes/providers.ts Uses version in queries
api src/routes/admin.ts Admin version management
api src/utils/query.ts Builds version filter queries
leaderboard lib/types.ts Defines BenchmarkVersion interface
leaderboard lib/api.ts Passes version query param
leaderboard components/version-selector.tsx Displays version picker
leaderboard app/page.tsx Passes version to leaderboard
leaderboard app/runs/page.tsx Displays version in runs
leaderboard app/submission/[id]/page.tsx Shows submission version
leaderboard app/about/page.tsx Documents versioning scheme

Part 2: Implementation Plan

Goal

Replace opaque git hashes with semantic versions (e.g., 1.0.0, 1.1.0, 2.0.0) while maintaining backward compatibility with existing submissions. GitHub releases are the single source of truth — versions are managed through GitHub releases of the github.com/pinchbench/skill repository.


How the Benchmark Knows Its Version (skill repo)

Approach: Hybrid with setuptools-scm + BENCHMARK_VERSION Fallback

This handles all user scenarios:

Scenario Version Source
Installed via pip importlib.metadata.version() (from git tag via setuptools-scm)
Cloned and run directly BENCHMARK_VERSION file
Downloaded zip (no git) BENCHMARK_VERSION file
Dev environment (untagged) Git hash fallback

pyproject.toml Changes

[build-system]
requires = ["setuptools>=61", "setuptools-scm>=8"]
build-backend = "setuptools.build_meta"

[project]
name = "pinchbench-skill"
dynamic = ["version"]  # Version comes from setuptools-scm
description = "PinchBench - OpenClaw Agent Benchmarking System"
readme = "SKILL.md"
requires-python = ">=3.10"
dependencies = [
    "pyyaml>=6.0.1",
    "fabric>=3.2.2",
    "paramiko>=3.0.0",
]

[tool.setuptools_scm]
# Extracts version from git tags like v1.0.0
# Falls back to 0.0.0+hash for untagged commits

New File: BENCHMARK_VERSION

Plain text file in repo root containing just the version string:

1.0.0

This file is auto-updated by GitHub Actions on each release.

New Function: _get_benchmark_version()

Replace _get_git_version() in scripts/benchmark.py:

def _get_benchmark_version(script_dir: Path) -> str:
    """Get benchmark version with multiple fallback strategies.
    
    Resolution order:
    1. importlib.metadata (if package is installed)
    2. BENCHMARK_VERSION file (for cloned/downloaded usage)
    3. Git tag via `git describe`
    4. Git short hash (ultimate fallback)
    """
    
    # 1. Try importlib.metadata (works if package is installed via pip)
    try:
        from importlib.metadata import version
        return version("pinchbench-skill")
    except Exception:
        pass
    
    # 2. Try BENCHMARK_VERSION file
    version_file = script_dir / "BENCHMARK_VERSION"
    if version_file.exists():
        v = version_file.read_text().strip()
        if v:
            return v
    
    # 3. Try git tag (e.g., "v1.0.0" -> "1.0.0")
    try:
        result = subprocess.run(
            ["git", "describe", "--tags", "--always"],
            capture_output=True,
            text=True,
            timeout=2,
            check=False,
            cwd=script_dir,
        )
        if result.returncode == 0:
            tag = result.stdout.strip()
            # Strip "v" prefix if present
            if tag.startswith("v"):
                return tag[1:]
            return tag
    except (subprocess.SubprocessError, FileNotFoundError, OSError):
        pass
    
    # 4. Ultimate fallback: git short hash
    return _get_git_version(script_dir)

GitHub Actions Workflow: .github/workflows/release.yml

Auto-updates BENCHMARK_VERSION file when a GitHub release is published:

name: Update Version on Release

on:
  release:
    types: [published]

jobs:
  update-version-file:
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4
        with:
          ref: main
          
      - name: Update BENCHMARK_VERSION file
        run: |
          # Extract version from tag (strip "v" prefix)
          VERSION="${GITHUB_REF_NAME#v}"
          echo "$VERSION" > BENCHMARK_VERSION
          echo "Updated BENCHMARK_VERSION to $VERSION"
          
      - name: Commit and push
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add BENCHMARK_VERSION
          git commit -m "chore: update BENCHMARK_VERSION to ${GITHUB_REF_NAME}"
          git push

Data Model Changes

benchmark_versions Table (api repo)

Current schema:

CREATE TABLE benchmark_versions (
  id TEXT PRIMARY KEY,      -- git hash (e.g., "a1b2c3d")
  created_at TEXT,
  current INTEGER,
  hidden INTEGER
);

New schema:

CREATE TABLE benchmark_versions (
  id TEXT PRIMARY KEY,           -- Can be git hash OR semver
  semver TEXT,                   -- Normalized semver: "1.0.0", "1.1.0"
  label TEXT,                    -- Display label: "1.0.0" or "1.0.0-beta.5"
  release_notes TEXT,            -- Markdown changelog (from GitHub release body)
  release_url TEXT,              -- Link to GitHub release
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  current INTEGER NOT NULL DEFAULT 0,
  hidden INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX IF NOT EXISTS idx_benchmark_versions_semver ON benchmark_versions(semver);

Data migration for existing versions:

All existing git-hash versions will be migrated to 1.0.0-beta.N semver format, where N is an incrementing number based on created_at order. Example:

id (hash) semver label
ad1c230 1.0.0-beta.1 1.0.0-beta.1
7df28f6 1.0.0-beta.2 1.0.0-beta.2
... ... ...
1e2ba6b 1.0.0-beta.17 1.0.0-beta.17

First true semver release becomes 1.0.0.

Migration script (to run manually on production):

-- migrations/YYYYMMDD_add_semver_columns.sql

-- Step 1: Add new columns
ALTER TABLE benchmark_versions ADD COLUMN semver TEXT;
ALTER TABLE benchmark_versions ADD COLUMN label TEXT;
ALTER TABLE benchmark_versions ADD COLUMN release_notes TEXT;
ALTER TABLE benchmark_versions ADD COLUMN release_url TEXT;

-- Step 2: Create index
CREATE INDEX IF NOT EXISTS idx_benchmark_versions_semver ON benchmark_versions(semver);
-- migrations/YYYYMMDD_backfill_legacy_versions.sql
-- Run this AFTER the column migration

-- This needs to be generated dynamically based on existing versions
-- The implementation should query existing versions ordered by created_at
-- and generate UPDATE statements like:
--
-- UPDATE benchmark_versions SET semver = '1.0.0-beta.1', label = '1.0.0-beta.1' WHERE id = 'ad1c230';
-- UPDATE benchmark_versions SET semver = '1.0.0-beta.2', label = '1.0.0-beta.2' WHERE id = '7df28f6';
-- ... etc

⚠️ Note: The implementation must produce SQL migration files that can be run manually against the production D1 database.

BenchmarkVersion API Response (api repo)

Current:

{ "id": "a1b2c3d", "created_at": "...", "is_current": true, "submission_count": 42 }

New:

{
  "id": "1.0.0",
  "semver": "1.0.0",
  "label": "1.0.0",
  "release_notes": "## What's Changed\n- Initial stable release...",
  "release_url": "https://github.com/pinchbench/skill/releases/tag/v1.0.0",
  "created_at": "...",
  "is_current": true,
  "submission_count": 42
}

For legacy versions:

{
  "id": "a1b2c3d",
  "semver": "1.0.0-beta.5",
  "label": "1.0.0-beta.5",
  "release_notes": null,
  "release_url": null,
  "created_at": "...",
  "is_current": false,
  "submission_count": 15
}

BenchmarkVersion TypeScript Type (leaderboard repo)

export interface BenchmarkVersion {
  id: string;
  semver: string | null;        // null only if migration hasn't run yet
  label: string;                // display string
  release_notes: string | null;
  release_url: string | null;   // link to GitHub release
  created_at: string;
  is_current: boolean;
  submission_count: number;
}

Changes by Repository

skill repo

File Change
pyproject.toml Add setuptools-scm, set dynamic = ["version"]
BENCHMARK_VERSION NEW FILE: Plain text version file (e.g., 1.0.0)
.github/workflows/release.yml NEW FILE: Auto-update BENCHMARK_VERSION on release
scripts/benchmark.py Replace _get_git_version() with _get_benchmark_version() using multi-strategy resolution

api repo

File Change
schema.sql Add semver, label, release_notes, release_url columns
migrations/YYYYMMDD_add_semver_columns.sql NEW: Migration to add columns
migrations/YYYYMMDD_backfill_legacy_versions.sql NEW: Migration to backfill existing versions
src/routes/benchmarkVersions.ts Return new fields, sort by semver when available
src/routes/results.ts Accept any version format (no enforcement—old clients may still submit hashes)
src/types.ts Update BenchmarkVersion type

No semver enforcement: Since old skill clients will continue to submit git hashes, the API must accept any string as benchmark_version. Semver format is encouraged but not required.

leaderboard repo

File Change
lib/types.ts Add semver, label, release_notes, release_url to BenchmarkVersion
components/version-selector.tsx Display label instead of id.slice(0,8), remove GitCommit icon for semver versions
app/about/page.tsx Update versioning documentation

UI display logic:

const displayVersion = version.label ?? version.id.slice(0, 8);
const isSemver = version.semver !== null && !version.semver.includes('beta');

Sorting Strategy

Versions will be sorted with semver versions first (sorted by semver rules), then git-hash versions by date:

  1. Semver versions: Sorted by semver comparison (2.0.0 > 1.1.0 > 1.0.0 > 1.0.0-beta.17 > 1.0.0-beta.1)
  2. Git-hash versions without semver: Sorted by created_at DESC

Since old clients may continue submitting git hashes indefinitely, the system must gracefully handle mixed version types.


Backward Compatibility

Scenario Handling
Old skill submits git hash API accepts, stores as before, auto-creates version with null semver
New skill submits semver API stores with id = semver
Frontend receives null semver Falls back to id.slice(0,8) display
API filters by version Works with both hash and semver IDs
Existing submissions Unchanged, still link to their git-hash version

Release Process (post-migration)

  1. Maintainer creates GitHub release with tag v1.1.0 and release notes
  2. GitHub Actions automatically updates BENCHMARK_VERSION file to 1.1.0
  3. Users who clone get the new BENCHMARK_VERSION file
  4. Users who pip install get version from package metadata (via setuptools-scm)
  5. First submission with new version auto-creates benchmark_versions record
  6. Admin sets current = 1 for new version when ready to make it default
  7. Admin optionally adds release_notes and release_url (or automate via webhook later)

Decisions Made

  1. Version source of truth: GitHub releases (git tags). The BENCHMARK_VERSION file is auto-updated by CI.

  2. Legacy labeling: Existing git-hash versions get 1.0.0-beta.N semver labels (not "Legacy" bucket). This keeps everything in semver format for consistent sorting.

  3. Semver enforcement: No enforcement. Old clients will continue submitting git hashes, so API accepts any string. New skill versions encouraged to use semver via BENCHMARK_VERSION file.

  4. Release metadata: release_notes field for markdown changelog + release_url field for link to GitHub release.

  5. Sorting: Semver-aware sorting. Proper semver versions first (sorted by semver rules), then git-hash versions by date.

  6. Version resolution: Hybrid approach with setuptools-scm for installed packages + BENCHMARK_VERSION file fallback for direct usage.


Acceptance Criteria

  • pyproject.toml updated with setuptools-scm for dynamic versioning
  • BENCHMARK_VERSION file created and populated
  • GitHub Actions workflow auto-updates BENCHMARK_VERSION on release
  • _get_benchmark_version() function implements multi-strategy resolution
  • Version dropdown shows 1.0.0, 1.1.0, etc. for new versions
  • Legacy versions display as 1.0.0-beta.N
  • API returns semver, label, release_notes, release_url fields
  • Database migration scripts produced for manual production run
  • Sorting handles mixed semver/hash versions correctly
  • Documentation updated in about page
  • Release process documented for maintainers

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions