Clawstore's database is Cloudflare D1 (SQLite). The schema lives inside the API Worker at apps/api/src/db/schema.ts and is defined with Drizzle ORM's SQLite dialect — there is no separate packages/db workspace. D1 bindings, migrations, and query code all live with the API because that Worker is the only code path that touches the database. This document sketches every table, its fields, and the integrity rules that matter operationally.
Two classes of tables coexist in the same database:
- Better Auth tables — generated by
npx @better-auth/cli generatefrom the auth config. Clawstore does not hand-write these. They are plural (users,sessions,accounts,verifications) because the auth config setsusePlural: true. - Clawstore tables — hand-written in Drizzle. All application state that is not authentication state.
The rule is: if Better Auth owns it, don't touch it from the Clawstore schema file. Cross-references between the two live in the Clawstore tables (foreign keys to users.id), never in the auth tables.
Generated, not hand-written. The high-level shape:
| Table | Purpose |
|---|---|
users |
One row per authenticated GitHub user. Holds the Clawstore-side identity: id, email, name, image, createdAt. |
sessions |
Active web sessions. Cookie-backed, persisted in D1 so the Worker can validate without reaching an external service. |
accounts |
OAuth provider records. One row per (user_id, provider). Holds the GitHub login and provider-side identifiers. |
verifications |
Email verification tokens and similar short-lived challenges. Mostly unused at MVP since GitHub OAuth is the only sign-in method. |
When Better Auth's schema changes between versions, regenerate: pnpm --filter api auth:generate && pnpm --filter db drizzle:generate. See Backend API § Authentication for how the auth tables are consumed at runtime.
One row per claimed agent ID. A row exists from the moment a user publishes their first version under a given (scope, name) pair, and persists forever after (even if all versions are yanked).
| Field | Type | Description |
|---|---|---|
id |
text (UUID) |
Primary key. |
scope |
text |
Lowercased GitHub username of the owner. Together with name forms the scoped ID @scope/name. |
name |
text |
Agent name within the scope. Kebab-case, 2–64 chars. |
owner_user_id |
text (fk → users.id) |
The user who first published this agent. One owner per agent at MVP. |
latest_version_id |
text (fk → versions.id, nullable) |
Pointer to the current latest version. NULL before the first version is published. Advances on publish unless the new version is a pre-release. |
category |
text |
Category slug from the curated categories table. |
description |
text |
Long-form description (copied from the latest published manifest for indexing convenience). |
tagline |
text |
One-line tagline (same). |
display_name |
text |
Human-readable name (same). |
homepage |
text? |
Optional project URL. |
repository |
text? |
Optional source repo URL. |
license |
text |
SPDX identifier. |
download_count |
integer |
Cached sum of download_count across all versions. Refreshed on tarball download. Best-effort, not transactional. |
avg_rating |
real |
Cached average star rating (1.0–5.0). NULL if no reviews. Refreshed on review insert/update/delete. |
review_count |
integer |
Cached count of reviews. Refreshed alongside avg_rating. |
created_at |
timestamp |
First publish time. |
updated_at |
timestamp |
Last publish or metadata refresh. |
Constraints
UNIQUE (scope, name)- Indexes on
scope,name,category,owner_user_id,updated_at - LIKE-friendly indexes on
display_nameandtaglinefor search - Index on
download_count(forsort=downloads) - Index on
avg_rating(forsort=rating)
The denormalized description, tagline, display_name fields are snapshots from the most recently published version's manifest. They are refreshed on publish and on metadata refresh. This avoids a join to versions for every search query.
The denormalized download_count, avg_rating, and review_count fields are aggregates cached on the agent row for search sorting and listing performance. download_count is refreshed on every tarball download (best-effort). avg_rating and review_count are refreshed on every review insert, update, or delete.
Immutable once written. Every published version of every agent is a row here. Yanking flips yanked_at but never deletes or mutates other fields.
| Field | Type | Description |
|---|---|---|
id |
text (UUID) |
Primary key. |
agent_id |
text (fk → agents.id) |
The agent this version belongs to. |
version |
text |
SemVer string. Must be strictly greater than any existing version row for the same agent_id. |
channel |
text enum |
community | official | beta. Enforced in application code; D1 has no native enum. |
manifest |
text (JSON) |
The full agent.json as uploaded. Stored for audit, rendering, and re-serving via the API. |
tarball_r2_key |
text |
R2 object key for the tarball. Convention: tarballs/<scope>/<name>/<version>.tgz. |
tarball_sha256 |
text |
SHA-256 of the tarball contents. Used by the CLI to verify downloads. |
tarball_size_bytes |
integer |
Total tarball size. Capped at 100 MB. |
download_count |
integer |
Incremented on every tarball download. Best-effort, not transactional. |
uploaded_by_user_id |
text (fk → users.id) |
The user who published this version. |
uploaded_at |
timestamp |
Server wall clock at publish. |
yanked_at |
timestamp? |
If set, this version is yanked. Excluded from resolution but still downloadable by pinned version. |
yanked_by_user_id |
text? (fk → users.id) |
Who yanked it. |
yanked_reason |
text? |
Free-text reason supplied on yank. |
Constraints
UNIQUE (agent_id, version)- Index on
(agent_id, uploaded_at DESC)for version history queries - Index on
yanked_at(for filtering resolvable versions)
Many-to-many between agents and free-form tags. Keeps tag filtering cheap without relying on JSON column queries.
| Field | Type | Description |
|---|---|---|
agent_id |
text (fk → agents.id) |
|
tag |
text |
Lowercased, kebab-case. |
Constraints
PRIMARY KEY (agent_id, tag)- Index on
tag(for "all agents with tag X" queries)
Tags are refreshed on publish from the manifest's tags array: the previous set is replaced atomically.
One row per asset (icon, screenshot, etc.) within a published version. Assets are uploaded separately as part of the publish flow, not extracted from the tarball.
| Field | Type | Description |
|---|---|---|
id |
text (UUID) |
Primary key. |
version_id |
text (fk → versions.id) |
|
kind |
text enum |
icon | screenshot | other. |
r2_key |
text |
R2 object key. Convention: assets/<scope>/<name>/<version>/<sanitized-path>. |
content_type |
text |
Sniffed MIME type. |
size_bytes |
integer |
|
sha256 |
text |
|
ordering |
integer |
Display order (for screenshots). |
Constraints
- Index on
(version_id, kind)for detail-page queries
Moderation queue. Filled by the public POST /v1/reports endpoint, drained by maintainers through GET /v1/reports + POST /v1/reports/:id/resolve.
| Field | Type | Description |
|---|---|---|
id |
text (UUID) |
Primary key. |
agent_id |
text (fk → agents.id) |
|
version_id |
text? (fk → versions.id) |
Nullable — report may target the whole agent. |
reporter_user_id |
text? (fk → users.id) |
Nullable — anonymous reports are allowed. |
reporter_ip_hash |
text |
SHA-256 of (IP + salt). Used for rate limiting and abuse detection, not for identification. |
reason |
text enum |
malicious | trademark | spam | inappropriate | other. |
details |
text |
Free-text. Capped. |
status |
text enum |
open | resolved | dismissed. |
resolved_by_user_id |
text? (fk → users.id) |
|
resolved_at |
timestamp? |
|
resolution_notes |
text? |
|
created_at |
timestamp |
Constraints
- Index on
(status, created_at)for "open reports, oldest first" - Index on
agent_id - Index on
reporter_ip_hash(for rate limiting)
One row per user per agent. An authenticated user may leave one review for an agent they do not own. Reviews carry a 1–5 star rating and an optional text body.
| Field | Type | Description |
|---|---|---|
id |
text (UUID) |
Primary key. |
agent_id |
text (fk → agents.id) |
The agent being reviewed. |
reviewer_user_id |
text (fk → users.id) |
The user who wrote the review. |
rating |
integer |
Star rating, 1–5. Enforced in application code. |
title |
text? |
Optional short summary. Max 120 chars. |
body |
text? |
Optional long-form review text. Max 2000 chars. |
created_at |
timestamp |
When the review was posted. |
updated_at |
timestamp |
Last edit time. Equals created_at if never edited. |
Constraints
UNIQUE (agent_id, reviewer_user_id)— one review per user per agent- Index on
agent_id(for listing reviews on an agent detail page) - Index on
reviewer_user_id(for user profile "my reviews" queries) - Index on
(agent_id, created_at DESC)(for paginated review feeds)
On insert, update, or delete, the API recalculates agents.avg_rating and agents.review_count from the surviving rows. This is a simple AVG(rating) / COUNT(*) over the reviews table filtered by agent_id. At MVP scale this is fast enough inline; at scale it moves to a background job.
Authors cannot review their own agents — the backend rejects with 403 if reviewer_user_id === agents.owner_user_id.
One row per user. Created on first GitHub sign-in, hydrated from the GitHub OAuth payload. Users may update their profile from the web frontend.
| Field | Type | Description |
|---|---|---|
user_id |
text (fk → users.id) |
Primary key. One-to-one with users. |
bio |
text? |
Short bio. Max 280 chars. |
website |
text? |
URL. Validated format. |
location |
text? |
Free-text location. Max 100 chars. |
github_login |
text |
GitHub username, lowercased. Also the user's scope. |
avatar_url |
text? |
GitHub avatar URL, copied from the OAuth payload. |
display_name |
text? |
Human-readable name from GitHub, editable. |
created_at |
timestamp |
Profile creation time (mirrors users.createdAt). |
updated_at |
timestamp |
Last profile edit. |
Constraints
PRIMARY KEY (user_id)UNIQUE (github_login)
On first sign-in, Better Auth creates the users row. A Clawstore post-sign-in hook creates the profiles row, seeding github_login, avatar_url, and display_name from the GitHub OAuth response. If the user later edits their profile on useclawstore.com, only the profiles row is updated — the users row is never touched by Clawstore code.
Seeded list, curated by maintainers. Not a moving part in day-to-day operation.
| Field | Type | Description |
|---|---|---|
id |
text |
Primary key. Slug like "health-fitness". |
name |
text |
Display name. |
icon |
text |
Emoji or icon identifier. |
sort_order |
integer |
Display order in the UI. |
Seeded via a migration at bootstrap; the API exposes read-only access under GET /v1/categories.
Immutability is enforced in application code, not in the database. SQLite has no constraint mechanism that cleanly expresses "rows in this table may be inserted and a specific column may be updated but nothing else." The rules that matter:
- Publishing a version inserts a
versionsrow and onlyINSERTs. No code path in Clawstore updates the core version fields (version,manifest,tarball_r2_key,tarball_sha256,tarball_size_bytes,uploaded_at,uploaded_by_user_id) after insert. - Yanking updates only
yanked_at,yanked_by_user_id,yanked_reasonon the version row. download_countupdates are best-effort and tolerate races. They do not participate in any transactional guarantee. The cacheddownload_countonagentsis refreshed alongside version-level counters.avg_ratingandreview_countonagentsare recalculated fromreviewsrows on every review write. These are derived aggregates, not source-of-truth — thereviewstable is authoritative.- Version rows are never deleted by any code path. A yanked version is still downloadable by operators with a pinned version — this is intentional, matches npm/crates.io semantics, and keeps reproducible installs intact.
To keep accidents unlikely, the Drizzle client wrapper exposes insertVersion() and yankVersion() as the only two mutation paths for the versions table. Direct .update() or .delete() calls on versions are considered bugs and should not pass review.
Channels are a text column on versions with application-level validation.
| Channel | Who can publish | Listed by default? | latest pointer? |
|---|---|---|---|
community |
Any authenticated user | Yes | Yes |
official |
Authenticated user with the official role (curated allowlist) |
Yes | Yes |
beta |
Any authenticated user — requires a pre-release suffix in the version string |
Optional filter | No — latest never advances to a beta |
An agent may have versions in multiple channels. The latest pointer on agents always refers to the most recent stable (non-beta) version, regardless of channel.
The private channel is deferred. When added, it will sit on the versions row with a visibility ACL on the agents row.
MVP search is deliberately boring: Drizzle query builder LIKE queries over denormalized text columns on agents, plus indexed tag joins through agent_tags.
- D1 supports SQLite FTS5 virtual tables. They would give better ranking and relevance.
- FTS5 is a SQLite feature. Postgres FTS is structurally different. Writing against FTS5 means the D1-to-Postgres migration has to port every FTS query by hand, not by swapping a dialect import.
- At MVP scale (10–1000 agents),
LIKEon indexed columns is fast enough that users will not notice the difference. - Upgrading to Postgres FTS + trigram at migration time is additive — it improves search quality — and costs one focused refactor instead of a bespoke port.
Search queries are LIKE '%q%' against:
agents.display_nameagents.taglineagents.scopeagents.name
Filters join:
agent_tags.tag(exact match)agents.category(exact match)agents.channelof thelatest_version(exact match, by joiningversions)
Indexes on every LIKE-ed column. At 10k agents this still returns results in single-digit milliseconds on D1.
These are rules the schema file must follow to keep the migration cost low. Violations are bugs; reviewers should catch them.
- All DB access flows through Drizzle's query builder. No raw SQL unless absolutely necessary. When raw SQL is necessary, it must be flagged in a code comment and re-verified during migration.
- No SQLite-specific types. Use Drizzle's
text,integer,timestamp,jsonhelpers. They normalize across SQLite and Postgres. - No D1 FTS5 virtual tables. See above.
- No SQLite-specific date/time functions in application code. Use
Date.now()in TypeScript and pass numeric timestamps through Drizzle'stimestamphelper. - JSON columns via Drizzle's
jsonhelper only. No string-packed JSON with LIKE queries. - Foreign keys declared on every relation. D1 requires
PRAGMA foreign_keys = ON; Drizzle handles this. FKs both document intent and validate test data. - Primary keys are TEXT UUIDs, not integers. Portable, human-greppable in logs, collision-proof across environments.
- Better Auth's
providerstring flips from"sqlite"to"pg"at migration time — one line in the auth config. See thebetter-auth-cloudflareintegration inapps/api/src/auth.ts.
At 100-agent scale the migration is a half-day of work: re-point Drizzle imports from drizzle-orm/sqlite-core to drizzle-orm/pg-core, regenerate migrations, dump D1 via wrangler d1 export, massage into Postgres COPY statements, import, swap the binding in sst.config.ts. At 10k-agent scale it is roughly a week, mostly data validation.
- Backend API — routes that read and write these tables
- Agent Package — the
agent.jsonmanifest whose fields are denormalized intoagentsand persisted in full onversions - Auth and Ownership — review permissions and profile enrichment
- Trust and Moderation — review moderation policy
- Documentation hub