feat: Milestone 5 - Scoring & Reputation #163

flyingrobots · 2025-12-23T10:43:40Z

This PR completes Milestone 5 of the db8 roadmap:

M5: Scoring & Reputation

Rubric Scoring: Implemented scores table and score_submit RPC for E/R/C/V/Y evaluation.
Elo System: Developed reputation_update_round function for deterministic global and tag-specific Elo updates.
Aggregation: Created view_score_aggregates with security-barrier hardening and weighted composite scoring.
API: Integrated scoring and reputation retrieval into /rpc endpoints with numeric coercion for accuracy.
Tests: Added comprehensive unit and integration tests for scoring, reputation, and Elo math, using unique UUID ranges to ensure isolation.

All 83 tests passing.

… focus section

docs(ci): DB integration workflow + README milestone focus

…ests workflow)

Signed-off-by: James Ross <james@flyingrobots.dev>

…outes, CLI commands, web summary, tests, docs

…, sres)

…s; keep manual + weekly schedule; add concurrency

…quals)

… resolution

…in backoff loop

…bmit ON CONFLICT to include client_nonce; summary reads via view; fix tests and web polling; map CLI HTTP errors to exit codes

…atisfy dependency

… packages

…via import/core-modules

… ~/Codex note practices

… CI; Next build validates imports

…use dorny/paths-filter to run only when web/** changes (or on push to main)

… fix CI

…web paths to fix concurrency + resolver issues

…web paths

…quential run

…e; prove sequential DB tests incl. verify run

…ce scores

coderabbitai · 2025-12-23T10:43:50Z

Summary by CodeRabbit

New Features
- Final-vote and “continue” voting flows; per-round final tallies and UI dialogs
- Scoring/rubric submission with aggregated score views
- Per-participant reputation (Elo) and reputation-by-tag lookups
- Verification: submit per-claim/verdicts and a Verification Summary view/endpoint
- SSH challenge/verify auth endpoints and CLI commands for verification
UI
- Verification summary card and verify/flag actions in Room page; improved attribution masking
Tests
- Many new integration and unit tests covering voting, verification, scoring, reputation and auth
Documentation
- New Verification docs and extensive debate/process documentation; CLI quickstart updates

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Walkthrough

This PR adds verification, final-vote and scoring features across DB, server, CLI, and web: new tables/views/RPCs for verification, final votes, scores, reputation (Elo), RLS policies, CLI commands, web UI dialogs, audit logging, and extensive integration + pgTAP tests.

Changes

Cohort / File(s)	Summary
CLI `bin/db8.js`	New commands: `vote:continue`, `vote:final`, `verify:submit`, `verify:summary`, `auth:challenge`, `auth:verify`; added EXIT.FAIL; validation hooks, JSON output and error-exit handling.
Server RPC & Schemas `server/rpc.js`, `server/schemas.js`	New endpoints: `/rpc/vote.final`, `/rpc/score.submit`, `/rpc/scores.get`, `/rpc/reputation.update`, `/rpc/reputation.get`, `/rpc/verify.submit`, `/rpc/verify.summary`, `/auth/challenge`, `/auth/verify`; added in-memory fallbacks, SSE events, SSH auth flows, and Zod schemas `FinalVote`, `ScoreSubmit`, `ScoreGet`, `ReputationGet`, `VerifySubmit`, `AuthChallengeIn`, `AuthVerifyIn`.
Database schema `db/schema.sql`	Added tables: `final_votes`, `scores`, `reputation`, `reputation_tag`, `verification_verdicts`; added `rooms.status` and `rooms.config`; added `db8_current_participant_id()` helper (moved here). Indexes and uniqueness/dedup via `client_nonce`.
Database RPC & Views `db/rpc.sql`	Added RPCs: `verify_submit`, `verify_summary`, `vote_final_submit`, `score_submit`, `reputation_update_round`; added views: `view_score_aggregates`, `view_final_tally`, `verification_verdicts_view`, `participants_view`, `rounds_view`; added audit logging calls and security_barrier flags; attribution-aware `submissions_view`/`submissions_with_flags_view` changes.
Row-Level Security `db/rls.sql`	Enabled RLS on new tables (`verification_verdicts`, `final_votes`, `scores`, `reputation`, `reputation_tag`); added granular read policies and default-deny write patterns; removed `db8_current_participant_id()` (relocated).
Web UI `web/app/room/[roomId]/page.jsx`	UI: continue/final vote modals (`showContinueVote`, `showFinalVote`), handlers `onContinueVote`, `onFinalVote`, verification summary polling, Verify/Flag dialogs, anonymity/author masking display, tally badges and components (ConfidenceBadge, VerdictBar).
Tests — integration & unit `server/test/.test.js`, `db/test/.pgtap`	Many new tests: verification, final-vote, final_tally, scoring/reputation, auth.ssh, attribution, audit integration, lifecycle, SSE/events updates, CLI verify test, and multiple pgTAP scripts for RLS and verification invariants. Most tests use __setDbPool injection for DB-backed paths.
Watcher / background `server/watcher.js`	Switched CTE source to `rounds_view` (reads via new view).
CI / tooling / docs `.github/workflows/`, `web/next.config.js`, `eslint.config.js`, docs/*	Added db-tests workflow, web-build steps, Next.js lint ignoreDuringBuilds, ESLint resolver tweak, many docs (Verification.md, debate docs), markdown fencing fixes, cspell additions, and commit-msg hook merge pattern.
Misc tests & infra tweaks `server/test/*`	Multiple test harness adjustments: robust JSON parsing, pool injection patterns, guard server shutdown, room_create RPC signature update in tests.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI (bin/db8.js)
    participant Server as Server RPC (server/rpc.js)
    participant DB as Database (rpc/sql + schema)
    participant Audit as Admin Audit Log

    rect rgba(100,150,200,0.15)
    Note over CLI,DB: Final Vote Submission
    CLI->>Server: POST /rpc/vote.final {round_id, voter_id, approval, ranking, client_nonce}
    Server->>DB: CALL vote_final_submit(...)
    DB->>DB: Insert/upsert final_votes (client_nonce dedupe) 
    DB->>DB: Possibly update room.status = 'closed'
    DB->>Audit: admin_audit_log_write(entity='final_vote', action='vote', actor=voter_id,...)
    DB-->>Server: {id}
    Server-->>CLI: {ok: true, id}
    end

    par Asynchronous
        DB->>DB: reputation_update_round(round_id)  -- compute Elo, update reputation/reputation_tag
        DB->>Audit: admin_audit_log_write(entity='reputation', action='update', ...)
    end

sequenceDiagram
    autonumber
    participant Web as Browser UI
    participant Server as Server RPC
    participant DB as Database
    participant SSE as SSE / Event stream

    rect rgba(150,200,150,0.12)
    Note over Web,DB: Verification Submit Flow
    Web->>Server: POST /rpc/verify.submit {round_id, reporter_id, submission_id, claim_id, verdict, rationale, client_nonce}
    Server->>DB: CALL verify_submit(...)
    DB->>DB: Upsert verification_verdicts (idempotent by keys)
    DB->>Audit: admin_audit_log_write(entity='verification_verdicts', action='create'/'update', ...)
    DB-->>Server: {id}
    Server->>SSE: publish verdict event
    Server-->>Web: {ok: true, id}
    Web->>Server: GET /rpc/verify.summary?round_id=... (poll)
    Server->>DB: SELECT verify_summary(...)
    DB-->>Server: aggregated rows
    Server-->>Web: summary JSON
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

M3: Verification — verification verdicts (DB/RLS/RPCs, server/CLI/UI, tests, docs) #149 — This PR implements the M3 verification feature (verification_verdicts table, verify_submit/verify_summary RPCs, CLI/server/web/test coverage), directly matching that issue's scope.

Possibly related PRs

feat: complete Milestones 1, 2, and 3 (Infrastructure, Provenance, Verification) #161 — Overlapping changes to CLI handlers, server RPCs, and DB verification/final-vote schema; likely the closest related PR.
feat(verify): M3 verification verdicts — DB schema/RLS/RPCs, server routes, CLI commands, web summary, tests, docs #150 — Another PR that implements M3 verification (same tables/RPCs/endpoints); strong overlap.
feat(server): ENFORCE_AUTHOR_BINDING for provenance #138 — Related edits to server provenance/auth flows and author-binding checks used by auth/verify paths.

Poem

🧪 Tables, RPCs, and modals hum, audits keep the score,
Verdicts inked idempotent, nonces guard the door.
Votes fold, reputations climb, views mask then show,
Tests sprint through the pipeline — let the final tallies flow. 🎉

⚠️ Critical Review Notes (short, pedantic)

Function relocation: confirm all references updated after moving/duplicating db8_current_participant_id() between rls/schema.
RLS coverage: inspect policies for final_votes/scores/reputation to avoid cross-room leaks.
Client-nonce uniqueness semantics: verify dedupe keys and retry behavior documented in CLI/web.
Concurrency: final-vote → room.status closure needs race testing under concurrent submissions and UI state refresh.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.05% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Milestone 5 - Scoring & Reputation' is directly related to the main changeset, which implements scoring rubric, Elo reputation system, aggregation views, and related APIs.
Description check	✅ Passed	The description substantively covers the key accomplishments: rubric scoring, Elo system, aggregation, API integration, and test coverage, all of which are present in the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch milestone/m5

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 31

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

server/test/audit.integration.test.js (2)
21-49: CI is screaming: FK constraint violations because your seeding is silently failing.

Pipeline failures reveal:

Line 41: submissions_round_id_fkey violation — the round doesn't exist when submission_upsert runs.

Line 69: round not found: 33343334-0000-0000-0000-000000000002 — confirming the round was never inserted.

Root cause: Your ON CONFLICT DO NOTHING strategy (lines 27, 32, 36) is a silent failure factory. If any constraint conflicts (perhaps from previous test runs or stale state), the inserts are skipped and you proceed blissfully unaware.

The unforgivable sin: You never verify your seeds succeeded. You fire-and-forget INSERT statements, then act surprised when downstream queries fail.
🔎 Proposed fix: Verify seeds or use RETURNING
   it('room_create should be audit-logged (implied via watcher or manual call)', async () => {
     const roomId = '33343334-0000-0000-0000-000000000001';
     const roundId = '33343334-0000-0000-0000-000000000002';
     const participantId = '33343334-0000-0000-0000-000000000003';
 
     // Seed data
-    await pool.query('insert into rooms(id, title) values ($1, $2) on conflict do nothing', [
+    const roomRes = await pool.query(
+      `insert into rooms(id, title) values ($1, $2) 
+       on conflict (id) do update set title = excluded.title 
+       returning id`,
+      [roomId, 'Audit Room Unique']
+    );
+    expect(roomRes.rows.length).toBe(1);
-      roomId,
-      'Audit Room Unique'
-    ]);
-    await pool.query(
-      'insert into rounds(id, room_id, idx, phase) values ($1, $2, 0, $3) on conflict do nothing',
-      [roundId, roomId, 'submit']
+    
+    const roundRes = await pool.query(
+      `insert into rounds(id, room_id, idx, phase) values ($1, $2, 0, $3) 
+       on conflict (id) do update set phase = excluded.phase 
+       returning id`,
+      [roundId, roomId, 'submit']
     );
+    expect(roundRes.rows.length).toBe(1);
+
-    await pool.query(
-      'insert into participants(id, room_id, anon_name) values ($1, $2, $3) on conflict do nothing',
-      [participantId, roomId, 'audit_anon_unique']
+    const partRes = await pool.query(
+      `insert into participants(id, room_id, anon_name) values ($1, $2, $3) 
+       on conflict (id) do update set anon_name = excluded.anon_name 
+       returning id`,
+      [participantId, roomId, 'audit_anon_unique']
     );
+    expect(partRes.rows.length).toBe(1);
61-84: Test isolation nightmare: reusing stale IDs across tests.

This test at line 61 reuses roundId and participantId from the first test (lines 62-63), assuming they still exist. But:

If the first test's seeding failed (as CI shows), this test inherits the failure.

You update the round to published (line 66), but if the round doesn't exist, this UPDATE affects 0 rows and you don't check.

The vote_submit call then fails with "round not found."

Each test should be self-contained. Don't rely on side effects from previous tests.
🔎 Proposed fix: Self-contained seeding per test
   it('vote_submit should be audit-logged', async () => {
-    const roundId = '33343334-0000-0000-0000-000000000002';
-    const participantId = '33343334-0000-0000-0000-000000000003';
+    const roomId = '33343334-0000-0000-0000-000000000010';
+    const roundId = '33343334-0000-0000-0000-000000000011';
+    const participantId = '33343334-0000-0000-0000-000000000012';
+
+    // Self-contained seeding
+    await pool.query(
+      `insert into rooms(id, title) values ($1, $2) 
+       on conflict (id) do update set title = excluded.title`,
+      [roomId, 'Vote Test Room']
+    );
+    await pool.query(
+      `insert into rounds(id, room_id, idx, phase) values ($1, $2, 0, $3) 
+       on conflict (id) do update set phase = excluded.phase`,
+      [roundId, roomId, 'published']
+    );
+    await pool.query(
+      `insert into participants(id, room_id, anon_name) values ($1, $2, $3) 
+       on conflict (id) do update set anon_name = excluded.anon_name`,
+      [participantId, roomId, 'vote_test_anon']
+    );
 
-    // Set round to published
-    await pool.query('update rounds set phase = $1 where id = $2', ['published', roundId]);
server/rpc.js (1)
974-984: author_anon_name is referenced but never selected from the database.

Line 978 maps row.author_anon_name into the transcript, but the SQL query at lines 956-967 doesn't SELECT author_anon_name. You're selecting from submissions_with_flags_view, which DOES have author_anon_name according to db/rpc.sql line 436.

Wait—let me re-examine. The view definition at line 436 in db/rpc.sql does include author_anon_name. But I don't see it in the SELECT list at lines 956-967:
select id, author_id, content, canonical_sha256, submitted_at, flag_count, flag_details
from submissions_with_flags_view
The column exists in the view but you forgot to SELECT it. Result: author_anon_name will be undefined for all transcript entries.
🔎 Proposed fix: Add author_anon_name to SELECT
           db.query(
             `select id,
                     author_id,
+                    author_anon_name,
                     content,
                     canonical_sha256,
                     submitted_at,
                     flag_count,
                     flag_details
                from submissions_with_flags_view
               where round_id = $1
               order by submitted_at asc nulls last, id asc`,
             [roundRow.round_id]
           ),

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7de3a8f and 40bdad9.

📒 Files selected for processing (15)

bin/db8.js
db/rls.sql
db/rpc.sql
db/schema.sql
server/rpc.js
server/schemas.js
server/test/attribution.test.js
server/test/audit.integration.test.js
server/test/auth.ssh.test.js
server/test/final_tally.test.js
server/test/lifecycle.test.js
server/test/scoring.test.js
server/test/sse.db.events.test.js
server/test/voting.final.test.js
web/app/room/[roomId]/page.jsx

🧰 Additional context used

📓 Path-based instructions (7)

{web,server,bin}/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

Use only JavaScript across web, server, and CLI; no TypeScript

Files:

server/test/sse.db.events.test.js
server/test/lifecycle.test.js
server/test/auth.ssh.test.js
server/schemas.js
server/test/attribution.test.js
server/test/final_tally.test.js
server/test/voting.final.test.js
bin/db8.js
server/test/scoring.test.js
server/rpc.js
server/test/audit.integration.test.js

{server,bin}/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

{server,bin}/**/*.js: Validate inputs with Zod at the edges (endpoints, CLI)
Canonical JSON must use JCS (RFC 8785) by default; legacy sorted-key option available via CANON_MODE=sorted

Files:

server/test/sse.db.events.test.js
server/test/lifecycle.test.js
server/test/auth.ssh.test.js
server/schemas.js
server/test/attribution.test.js
server/test/final_tally.test.js
server/test/voting.final.test.js
bin/db8.js
server/test/scoring.test.js
server/rpc.js
server/test/audit.integration.test.js

**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use ESLint + Prettier for code style (see eslint.config.js and .prettierrc)

Files:

server/test/sse.db.events.test.js
server/test/lifecycle.test.js
server/test/auth.ssh.test.js
server/schemas.js
server/test/attribution.test.js
server/test/final_tally.test.js
web/app/room/[roomId]/page.jsx
server/test/voting.final.test.js
bin/db8.js
server/test/scoring.test.js
server/rpc.js
server/test/audit.integration.test.js

server/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

server/**/*.js: Use Zod schemas at the edges of Express endpoints for request validation
Server endpoints must include Zod validation, in-memory fallback patterns, and optional DATABASE_URL persistence
Signatures (Ed25519) must include strict author binding when participants.ssh_fingerprint is configured; return 400 with expected/got on mismatch

Files:

server/test/sse.db.events.test.js
server/test/lifecycle.test.js
server/test/auth.ssh.test.js
server/schemas.js
server/test/attribution.test.js
server/test/final_tally.test.js
server/test/voting.final.test.js
server/test/scoring.test.js
server/rpc.js
server/test/audit.integration.test.js

db/**/*.sql

📄 CodeRabbit inference engine (AGENTS.md)

db/**/*.sql: Database schema migrations must be tracked and schema must maintain idempotency via ON CONFLICT for RPCs
Submission verification (verify_submit) must enforce judge/host roles and published/final round phases
Database RLS policies must restrict visibility by role and status; use SECURITY BARRIER on views when pre-publish leakage is a risk

Files:

db/schema.sql
db/rpc.sql
db/rls.sql

web/**/*.{jsx,js}

📄 CodeRabbit inference engine (AGENTS.md)

Web/React components must use AbortController for async operations to prevent setState after unmount

Files:

web/app/room/[roomId]/page.jsx

bin/**/*.js

📄 CodeRabbit inference engine (AGENTS.md)

CLI commands must use process.execPath and isolate temporary files; validate with Zod at entry points

Files:

bin/db8.js

🧠 Learnings (7)