feat(t3code): SQLite crash resilience — WAL checkpoint, backup, integrity check#96
feat(t3code): SQLite crash resilience — WAL checkpoint, backup, integrity check#96
Conversation
Proposal for task-010fa0f1 covering extraction of TASK_ID_RE, resolveTaskFile helper, reuse of updateFrontmatterField, and unification of duplicated priorityValue() across dashboard.ts and flow.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers two features: clickable retrospective links for done tasks, and a time-based filter (All/30d/7d/24h) using task file mtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, deprecated fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…up, integrity check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntegrity check On unclean shutdown, t3code's WAL-mode SQLite DB can be left in an inconsistent state. Add three defensive measures to src/t3code/server.ts: 1. WAL checkpoint on clean stop: before SIGTERM, open the DB via bun:sqlite and run PRAGMA wal_checkpoint(TRUNCATE), falling back to PASSIVE if an exclusive lock is unavailable. Errors are non-fatal. 2. Backup before start: copy state.sqlite (+ WAL/SHM side-files) to state.sqlite.bak with a rotating .bak → .bak.1 (keeping last 2 copies, ~360 MB total for a 180 MB DB). 3. Integrity check with auto-recovery: run PRAGMA integrity_check on startup; if it fails, attempt sqlite3 .recover piped into a fresh DB, verify the result with another integrity_check, then swap it in. If .recover fails or produces a bad DB, fall back to the .bak file. Corrupt DB is preserved as .corrupt for inspection. All paths are logged to stderr. Closes task-393bb51c. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 198f88d25b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/t3code/server.ts
Outdated
| backupDb(dbPath); | ||
| checkAndRecoverDb(dbPath); |
There was a problem hiding this comment.
Run integrity check before taking startup backup
Taking a fresh backup before checkAndRecoverDb can overwrite the only known-good snapshot with a corrupt database. On a startup where state.sqlite is already damaged, backupDb rotates the previous .bak to .bak.1 and writes the corrupt file into .bak; if .recover then fails, the restore path reads .bak and can “recover” to the same broken state instead of the last healthy one. This defeats the intended corruption fallback and can preserve data loss.
Useful? React with 👍 / 👎.
src/t3code/server.ts
Outdated
| const bakPath = dbPath + ".bak"; | ||
| if (existsSync(bakPath)) { |
There was a problem hiding this comment.
Fall back to
.bak.1 during backup restore
The recovery branch only attempts state.sqlite.bak even though backup rotation keeps state.sqlite.bak.1. In the exact corruption scenario this feature targets, .bak may be the newly-copied bad snapshot while .bak.1 is the last good copy, so skipping .bak.1 can cause avoidable restore failure and startup with corrupted state.
Useful? React with 👍 / 👎.
…ak.1 Addresses review finding: backupDb() was running before checkAndRecoverDb(), so a corrupt DB would be written into .bak. If .recover then failed, the backup-restore fallback would restore the same corrupt state. Three fixes: 1. Reorder in ensureServer(): checkAndRecoverDb() runs first; backupDb() only runs if the DB is healthy (including after a successful auto-recovery). .bak is now always a copy of a known-good database. 2. checkAndRecoverDb() backup fallback: verify each backup's integrity_check before restoring; skip to .bak.1 if .bak is also corrupt. 3. backupDb() rotation: WAL/SHM side-files (.bak-wal, .bak-shm) are now renamed to (.bak.1-wal, .bak.1-shm) during rotation so that .bak.1 remains self-consistent for restore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Addresses the 2026-03-26
state.sqlitecorruption incident. Three defensive measures added tosrc/t3code/server.ts(ludics-side only, no upstream t3code changes):stopServer()opens the DB viabun:sqliteand runsPRAGMA wal_checkpoint(TRUNCATE)before sending SIGTERM, flushing all WAL frames to the main file. Falls back toPASSIVEif an exclusive lock is unavailable. Errors are non-fatal — the stop always proceeds.ensureServer()copiesstate.sqlite(+ WAL/SHM side-files) to.bak, rotating the previous.bak→.bak.1(last 2 backups kept, ~360 MB for a 180 MB DB).PRAGMA integrity_checkruns before each startup. On failure: (1) trysqlite3 .recoverpiped into a fresh DB and verify with anotherintegrity_check, swapping it in if good; (2) fall back to.bakif.recoverfails. Corrupt DB is preserved as.corrupt.Test plan
ludics t3code stop— verifystate.sqlite-walis truncated to 0 bytes after stopstate.sqlite.bakandstate.sqlite.bak.1are present, no.bak.2.corruptfile is left behindbun run typecheckpasses ✅ |bun run buildproduces binary ✅Closes task-393bb51c.
🤖 Generated with Claude Code