Data Model

Clawstore's database is Cloudflare D1 (SQLite). The schema lives inside the API Worker at apps/api/src/db/schema.ts and is defined with Drizzle ORM's SQLite dialect — there is no separate packages/db workspace. D1 bindings, migrations, and query code all live with the API because that Worker is the only code path that touches the database. This document sketches every table, its fields, and the integrity rules that matter operationally.

Two classes of tables coexist in the same database:

Better Auth tables — generated by npx @better-auth/cli generate from the auth config. Clawstore does not hand-write these. They are plural (users, sessions, accounts, verifications) because the auth config sets usePlural: true.
Clawstore tables — hand-written in Drizzle. All application state that is not authentication state.

The rule is: if Better Auth owns it, don't touch it from the Clawstore schema file. Cross-references between the two live in the Clawstore tables (foreign keys to users.id), never in the auth tables.

Better Auth tables

Generated, not hand-written. The high-level shape:

Table	Purpose
`users`	One row per authenticated GitHub user. Holds the Clawstore-side identity: id, email, name, image, `createdAt`.
`sessions`	Active web sessions. Cookie-backed, persisted in D1 so the Worker can validate without reaching an external service.
`accounts`	OAuth provider records. One row per `(user_id, provider)`. Holds the GitHub login and provider-side identifiers.
`verifications`	Email verification tokens and similar short-lived challenges. Mostly unused at MVP since GitHub OAuth is the only sign-in method.

When Better Auth's schema changes between versions, regenerate: pnpm --filter api auth:generate && pnpm --filter db drizzle:generate. See Backend API § Authentication for how the auth tables are consumed at runtime.

Clawstore tables

`agents`

One row per claimed agent ID. A row exists from the moment a user publishes their first version under a given (scope, name) pair, and persists forever after (even if all versions are yanked).

Field	Type	Description
`id`	`text` (UUID)	Primary key.
`scope`	`text`	Lowercased GitHub username of the owner. Together with `name` forms the scoped ID `@scope/name`.
`name`	`text`	Agent name within the scope. Kebab-case, 2–64 chars.
`owner_user_id`	`text` (fk → `users.id`)	The user who first published this agent. One owner per agent at MVP.
`latest_version_id`	`text` (fk → `versions.id`, nullable)	Pointer to the current `latest` version. NULL before the first version is published. Advances on publish unless the new version is a pre-release.
`category`	`text`	Category slug from the curated `categories` table.
`description`	`text`	Long-form description (copied from the latest published manifest for indexing convenience).
`tagline`	`text`	One-line tagline (same).
`display_name`	`text`	Human-readable name (same).
`homepage`	`text?`	Optional project URL.
`repository`	`text?`	Optional source repo URL.
`license`	`text`	SPDX identifier.
`download_count`	`integer`	Cached sum of `download_count` across all versions. Refreshed on tarball download. Best-effort, not transactional.
`avg_rating`	`real`	Cached average star rating (1.0–5.0). NULL if no reviews. Refreshed on review insert/update/delete.
`review_count`	`integer`	Cached count of reviews. Refreshed alongside `avg_rating`.
`created_at`	`timestamp`	First publish time.
`updated_at`	`timestamp`	Last publish or metadata refresh.

Constraints

UNIQUE (scope, name)
Indexes on scope, name, category, owner_user_id, updated_at
LIKE-friendly indexes on display_name and tagline for search
Index on download_count (for sort=downloads)
Index on avg_rating (for sort=rating)

The denormalized description, tagline, display_name fields are snapshots from the most recently published version's manifest. They are refreshed on publish and on metadata refresh. This avoids a join to versions for every search query.

The denormalized download_count, avg_rating, and review_count fields are aggregates cached on the agent row for search sorting and listing performance. download_count is refreshed on every tarball download (best-effort). avg_rating and review_count are refreshed on every review insert, update, or delete.

`versions`

Immutable once written. Every published version of every agent is a row here. Yanking flips yanked_at but never deletes or mutates other fields.

Field	Type	Description
`id`	`text` (UUID)	Primary key.
`agent_id`	`text` (fk → `agents.id`)	The agent this version belongs to.
`version`	`text`	SemVer string. Must be strictly greater than any existing version row for the same `agent_id`.
`channel`	`text` enum	`community` \| `official` \| `beta`. Enforced in application code; D1 has no native enum.
`manifest`	`text` (JSON)	The full `agent.json` as uploaded. Stored for audit, rendering, and re-serving via the API.
`tarball_r2_key`	`text`	R2 object key for the tarball. Convention: `tarballs/<scope>/<name>/<version>.tgz`.
`tarball_sha256`	`text`	SHA-256 of the tarball contents. Used by the CLI to verify downloads.
`tarball_size_bytes`	`integer`	Total tarball size. Capped at 100 MB.
`download_count`	`integer`	Incremented on every tarball download. Best-effort, not transactional.
`uploaded_by_user_id`	`text` (fk → `users.id`)	The user who published this version.
`uploaded_at`	`timestamp`	Server wall clock at publish.
`yanked_at`	`timestamp?`	If set, this version is yanked. Excluded from resolution but still downloadable by pinned version.
`yanked_by_user_id`	`text?` (fk → `users.id`)	Who yanked it.
`yanked_reason`	`text?`	Free-text reason supplied on yank.

Constraints

UNIQUE (agent_id, version)
Index on (agent_id, uploaded_at DESC) for version history queries
Index on yanked_at (for filtering resolvable versions)

`agent_tags`

Many-to-many between agents and free-form tags. Keeps tag filtering cheap without relying on JSON column queries.

Field	Type	Description
`agent_id`	`text` (fk → `agents.id`)
`tag`	`text`	Lowercased, kebab-case.

Constraints

PRIMARY KEY (agent_id, tag)
Index on tag (for "all agents with tag X" queries)

Tags are refreshed on publish from the manifest's tags array: the previous set is replaced atomically.

`version_assets`

One row per asset (icon, screenshot, etc.) within a published version. Assets are uploaded separately as part of the publish flow, not extracted from the tarball.

Field	Type	Description
`id`	`text` (UUID)	Primary key.
`version_id`	`text` (fk → `versions.id`)
`kind`	`text` enum	`icon` \| `screenshot` \| `other`.
`r2_key`	`text`	R2 object key. Convention: `assets/<scope>/<name>/<version>/<sanitized-path>`.
`content_type`	`text`	Sniffed MIME type.
`size_bytes`	`integer`
`sha256`	`text`
`ordering`	`integer`	Display order (for screenshots).

Constraints

Index on (version_id, kind) for detail-page queries

`reports`

Moderation queue. Filled by the public POST /v1/reports endpoint, drained by maintainers through GET /v1/reports + POST /v1/reports/:id/resolve.

Field	Type	Description
`id`	`text` (UUID)	Primary key.
`agent_id`	`text` (fk → `agents.id`)
`version_id`	`text?` (fk → `versions.id`)	Nullable — report may target the whole agent.
`reporter_user_id`	`text?` (fk → `users.id`)	Nullable — anonymous reports are allowed.
`reporter_ip_hash`	`text`	SHA-256 of (IP + salt). Used for rate limiting and abuse detection, not for identification.
`reason`	`text` enum	`malicious` \| `trademark` \| `spam` \| `inappropriate` \| `other`.
`details`	`text`	Free-text. Capped.
`status`	`text` enum	`open` \| `resolved` \| `dismissed`.
`resolved_by_user_id`	`text?` (fk → `users.id`)
`resolved_at`	`timestamp?`
`resolution_notes`	`text?`
`created_at`	`timestamp`

Constraints

Index on (status, created_at) for "open reports, oldest first"
Index on agent_id
Index on reporter_ip_hash (for rate limiting)

`reviews`

One row per user per agent. An authenticated user may leave one review for an agent they do not own. Reviews carry a 1–5 star rating and an optional text body.

Field	Type	Description
`id`	`text` (UUID)	Primary key.
`agent_id`	`text` (fk → `agents.id`)	The agent being reviewed.
`reviewer_user_id`	`text` (fk → `users.id`)	The user who wrote the review.
`rating`	`integer`	Star rating, 1–5. Enforced in application code.
`title`	`text?`	Optional short summary. Max 120 chars.
`body`	`text?`	Optional long-form review text. Max 2000 chars.
`created_at`	`timestamp`	When the review was posted.
`updated_at`	`timestamp`	Last edit time. Equals `created_at` if never edited.

Constraints

UNIQUE (agent_id, reviewer_user_id) — one review per user per agent
Index on agent_id (for listing reviews on an agent detail page)
Index on reviewer_user_id (for user profile "my reviews" queries)
Index on (agent_id, created_at DESC) (for paginated review feeds)

On insert, update, or delete, the API recalculates agents.avg_rating and agents.review_count from the surviving rows. This is a simple AVG(rating) / COUNT(*) over the reviews table filtered by agent_id. At MVP scale this is fast enough inline; at scale it moves to a background job.

Authors cannot review their own agents — the backend rejects with 403 if reviewer_user_id === agents.owner_user_id.

`profiles`

One row per user. Created on first GitHub sign-in, hydrated from the GitHub OAuth payload. Users may update their profile from the web frontend.

Field	Type	Description
`user_id`	`text` (fk → `users.id`)	Primary key. One-to-one with `users`.
`bio`	`text?`	Short bio. Max 280 chars.
`website`	`text?`	URL. Validated format.
`location`	`text?`	Free-text location. Max 100 chars.
`github_login`	`text`	GitHub username, lowercased. Also the user's scope.
`avatar_url`	`text?`	GitHub avatar URL, copied from the OAuth payload.
`display_name`	`text?`	Human-readable name from GitHub, editable.
`created_at`	`timestamp`	Profile creation time (mirrors `users.createdAt`).
`updated_at`	`timestamp`	Last profile edit.

Constraints

PRIMARY KEY (user_id)
UNIQUE (github_login)

On first sign-in, Better Auth creates the users row. A Clawstore post-sign-in hook creates the profiles row, seeding github_login, avatar_url, and display_name from the GitHub OAuth response. If the user later edits their profile on useclawstore.com, only the profiles row is updated — the users row is never touched by Clawstore code.

`categories`

Seeded list, curated by maintainers. Not a moving part in day-to-day operation.

Field	Type	Description
`id`	`text`	Primary key. Slug like `"health-fitness"`.
`name`	`text`	Display name.
`icon`	`text`	Emoji or icon identifier.
`sort_order`	`integer`	Display order in the UI.

Seeded via a migration at bootstrap; the API exposes read-only access under GET /v1/categories.

Immutability rules

Immutability is enforced in application code, not in the database. SQLite has no constraint mechanism that cleanly expresses "rows in this table may be inserted and a specific column may be updated but nothing else." The rules that matter:

Publishing a version inserts a versions row and only INSERTs. No code path in Clawstore updates the core version fields (version, manifest, tarball_r2_key, tarball_sha256, tarball_size_bytes, uploaded_at, uploaded_by_user_id) after insert.
Yanking updates only yanked_at, yanked_by_user_id, yanked_reason on the version row.
download_count updates are best-effort and tolerate races. They do not participate in any transactional guarantee. The cached download_count on agents is refreshed alongside version-level counters.
avg_rating and review_count on agents are recalculated from reviews rows on every review write. These are derived aggregates, not source-of-truth — the reviews table is authoritative.
Version rows are never deleted by any code path. A yanked version is still downloadable by operators with a pinned version — this is intentional, matches npm/crates.io semantics, and keeps reproducible installs intact.

To keep accidents unlikely, the Drizzle client wrapper exposes insertVersion() and yankVersion() as the only two mutation paths for the versions table. Direct .update() or .delete() calls on versions are considered bugs and should not pass review.

Channels

Channels are a text column on versions with application-level validation.

Channel	Who can publish	Listed by default?	`latest` pointer?
`community`	Any authenticated user	Yes	Yes
`official`	Authenticated user with the `official` role (curated allowlist)	Yes	Yes
`beta`	Any authenticated user — requires a pre-release suffix in the `version` string	Optional filter	No — `latest` never advances to a beta

An agent may have versions in multiple channels. The latest pointer on agents always refers to the most recent stable (non-beta) version, regardless of channel.

The private channel is deferred. When added, it will sit on the versions row with a visibility ACL on the agents row.

Search strategy

MVP search is deliberately boring: Drizzle query builder LIKE queries over denormalized text columns on agents, plus indexed tag joins through agent_tags.

Why LIKE, not FTS5

D1 supports SQLite FTS5 virtual tables. They would give better ranking and relevance.
FTS5 is a SQLite feature. Postgres FTS is structurally different. Writing against FTS5 means the D1-to-Postgres migration has to port every FTS query by hand, not by swapping a dialect import.
At MVP scale (10–1000 agents), LIKE on indexed columns is fast enough that users will not notice the difference.
Upgrading to Postgres FTS + trigram at migration time is additive — it improves search quality — and costs one focused refactor instead of a bespoke port.

Which columns, which indexes

Search queries are LIKE '%q%' against:

agents.display_name
agents.tagline
agents.scope
agents.name

Filters join:

agent_tags.tag (exact match)
agents.category (exact match)
agents.channel of the latest_version (exact match, by joining versions)

Indexes on every LIKE-ed column. At 10k agents this still returns results in single-digit milliseconds on D1.

D1 → Postgres migration conventions

These are rules the schema file must follow to keep the migration cost low. Violations are bugs; reviewers should catch them.

All DB access flows through Drizzle's query builder. No raw SQL unless absolutely necessary. When raw SQL is necessary, it must be flagged in a code comment and re-verified during migration.
No SQLite-specific types. Use Drizzle's text, integer, timestamp, json helpers. They normalize across SQLite and Postgres.
No D1 FTS5 virtual tables. See above.
No SQLite-specific date/time functions in application code. Use Date.now() in TypeScript and pass numeric timestamps through Drizzle's timestamp helper.
JSON columns via Drizzle's json helper only. No string-packed JSON with LIKE queries.
Foreign keys declared on every relation. D1 requires PRAGMA foreign_keys = ON; Drizzle handles this. FKs both document intent and validate test data.
Primary keys are TEXT UUIDs, not integers. Portable, human-greppable in logs, collision-proof across environments.
Better Auth's provider string flips from "sqlite" to "pg" at migration time — one line in the auth config. See the better-auth-cloudflare integration in apps/api/src/auth.ts.

At 100-agent scale the migration is a half-day of work: re-point Drizzle imports from drizzle-orm/sqlite-core to drizzle-orm/pg-core, regenerate migrations, dump D1 via wrangler d1 export, massage into Postgres COPY statements, import, swap the binding in sst.config.ts. At 10k-agent scale it is roughly a week, mostly data validation.

Cross-references

Backend API — routes that read and write these tables
Agent Package — the agent.json manifest whose fields are denormalized into agents and persisted in full on versions
Auth and Ownership — review permissions and profile enrichment
Trust and Moderation — review moderation policy
Documentation hub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Model

Better Auth tables

Clawstore tables

`agents`

`versions`

`agent_tags`

`version_assets`

`reports`

`reviews`

`profiles`

`categories`

Immutability rules

Channels

Search strategy

Why LIKE, not FTS5

Which columns, which indexes

D1 → Postgres migration conventions

Cross-references

FilesExpand file tree

data-model.md

Latest commit

History

data-model.md

File metadata and controls

Data Model

Better Auth tables

Clawstore tables

agents

versions

agent_tags

version_assets

reports

reviews

profiles

categories

Immutability rules

Channels

Search strategy

Why LIKE, not FTS5

Which columns, which indexes

D1 → Postgres migration conventions

Cross-references

`agents`

`versions`

`agent_tags`

`version_assets`

`reports`

`reviews`

`profiles`

`categories`