Skip to content

feat(t3code): SQLite crash resilience — WAL checkpoint, backup, integrity check#96

Merged
lukstafi merged 6 commits intomainfrom
ludics/task-393bb51c-s2/root
Mar 28, 2026
Merged

feat(t3code): SQLite crash resilience — WAL checkpoint, backup, integrity check#96
lukstafi merged 6 commits intomainfrom
ludics/task-393bb51c-s2/root

Conversation

@lukstafi
Copy link
Copy Markdown
Owner

Summary

Addresses the 2026-03-26 state.sqlite corruption incident. Three defensive measures added to src/t3code/server.ts (ludics-side only, no upstream t3code changes):

  • WAL checkpoint on clean stop: stopServer() opens the DB via bun:sqlite and runs PRAGMA wal_checkpoint(TRUNCATE) before sending SIGTERM, flushing all WAL frames to the main file. Falls back to PASSIVE if an exclusive lock is unavailable. Errors are non-fatal — the stop always proceeds.
  • Backup before start: ensureServer() copies state.sqlite (+ WAL/SHM side-files) to .bak, rotating the previous .bak.bak.1 (last 2 backups kept, ~360 MB for a 180 MB DB).
  • Integrity check + auto-recovery on start: PRAGMA integrity_check runs before each startup. On failure: (1) try sqlite3 .recover piped into a fresh DB and verify with another integrity_check, swapping it in if good; (2) fall back to .bak if .recover fails. Corrupt DB is preserved as .corrupt.

Test plan

  • Normal start/stop cycle works with no visible changes
  • ludics t3code stop — verify state.sqlite-wal is truncated to 0 bytes after stop
  • Start server 3 times — verify state.sqlite.bak and state.sqlite.bak.1 are present, no .bak.2
  • Corrupt a test DB copy, restart — verify auto-recovery kicks in and .corrupt file is left behind
  • bun run typecheck passes ✅ | bun run build produces binary ✅

Closes task-393bb51c.

🤖 Generated with Claude Code

lukstafi and others added 5 commits March 28, 2026 16:20
Proposal for task-010fa0f1 covering extraction of TASK_ID_RE,
resolveTaskFile helper, reuse of updateFrontmatterField, and
unification of duplicated priorityValue() across dashboard.ts
and flow.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers two features: clickable retrospective links for done tasks, and
a time-based filter (All/30d/7d/24h) using task file mtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, deprecated fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…up, integrity check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ntegrity check

On unclean shutdown, t3code's WAL-mode SQLite DB can be left in an
inconsistent state. Add three defensive measures to src/t3code/server.ts:

1. WAL checkpoint on clean stop: before SIGTERM, open the DB via
   bun:sqlite and run PRAGMA wal_checkpoint(TRUNCATE), falling back
   to PASSIVE if an exclusive lock is unavailable. Errors are non-fatal.

2. Backup before start: copy state.sqlite (+ WAL/SHM side-files) to
   state.sqlite.bak with a rotating .bak → .bak.1 (keeping last 2
   copies, ~360 MB total for a 180 MB DB).

3. Integrity check with auto-recovery: run PRAGMA integrity_check on
   startup; if it fails, attempt sqlite3 .recover piped into a fresh
   DB, verify the result with another integrity_check, then swap it in.
   If .recover fails or produces a bad DB, fall back to the .bak file.
   Corrupt DB is preserved as .corrupt for inspection. All paths are
   logged to stderr.

Closes task-393bb51c.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 198f88d25b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +279 to +280
backupDb(dbPath);
checkAndRecoverDb(dbPath);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Run integrity check before taking startup backup

Taking a fresh backup before checkAndRecoverDb can overwrite the only known-good snapshot with a corrupt database. On a startup where state.sqlite is already damaged, backupDb rotates the previous .bak to .bak.1 and writes the corrupt file into .bak; if .recover then fails, the restore path reads .bak and can “recover” to the same broken state instead of the last healthy one. This defeats the intended corruption fallback and can preserve data loss.

Useful? React with 👍 / 👎.

Comment on lines +870 to +871
const bakPath = dbPath + ".bak";
if (existsSync(bakPath)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to .bak.1 during backup restore

The recovery branch only attempts state.sqlite.bak even though backup rotation keeps state.sqlite.bak.1. In the exact corruption scenario this feature targets, .bak may be the newly-copied bad snapshot while .bak.1 is the last good copy, so skipping .bak.1 can cause avoidable restore failure and startup with corrupted state.

Useful? React with 👍 / 👎.

…ak.1

Addresses review finding: backupDb() was running before checkAndRecoverDb(),
so a corrupt DB would be written into .bak. If .recover then failed, the
backup-restore fallback would restore the same corrupt state.

Three fixes:

1. Reorder in ensureServer(): checkAndRecoverDb() runs first; backupDb()
   only runs if the DB is healthy (including after a successful auto-recovery).
   .bak is now always a copy of a known-good database.

2. checkAndRecoverDb() backup fallback: verify each backup's integrity_check
   before restoring; skip to .bak.1 if .bak is also corrupt.

3. backupDb() rotation: WAL/SHM side-files (.bak-wal, .bak-shm) are now
   renamed to (.bak.1-wal, .bak.1-shm) during rotation so that .bak.1
   remains self-consistent for restore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lukstafi lukstafi merged commit 628f96f into main Mar 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant