Skip to content

Latest commit

 

History

History
278 lines (208 loc) · 17.2 KB

File metadata and controls

278 lines (208 loc) · 17.2 KB

Data Model

Clawstore's database is Cloudflare D1 (SQLite). The schema lives inside the API Worker at apps/api/src/db/schema.ts and is defined with Drizzle ORM's SQLite dialect — there is no separate packages/db workspace. D1 bindings, migrations, and query code all live with the API because that Worker is the only code path that touches the database. This document sketches every table, its fields, and the integrity rules that matter operationally.

Two classes of tables coexist in the same database:

  • Better Auth tables — generated by npx @better-auth/cli generate from the auth config. Clawstore does not hand-write these. They are plural (users, sessions, accounts, verifications) because the auth config sets usePlural: true.
  • Clawstore tables — hand-written in Drizzle. All application state that is not authentication state.

The rule is: if Better Auth owns it, don't touch it from the Clawstore schema file. Cross-references between the two live in the Clawstore tables (foreign keys to users.id), never in the auth tables.

Better Auth tables

Generated, not hand-written. The high-level shape:

Table Purpose
users One row per authenticated GitHub user. Holds the Clawstore-side identity: id, email, name, image, createdAt.
sessions Active web sessions. Cookie-backed, persisted in D1 so the Worker can validate without reaching an external service.
accounts OAuth provider records. One row per (user_id, provider). Holds the GitHub login and provider-side identifiers.
verifications Email verification tokens and similar short-lived challenges. Mostly unused at MVP since GitHub OAuth is the only sign-in method.

When Better Auth's schema changes between versions, regenerate: pnpm --filter api auth:generate && pnpm --filter db drizzle:generate. See Backend API § Authentication for how the auth tables are consumed at runtime.

Clawstore tables

agents

One row per claimed agent ID. A row exists from the moment a user publishes their first version under a given (scope, name) pair, and persists forever after (even if all versions are yanked).

Field Type Description
id text (UUID) Primary key.
scope text Lowercased GitHub username of the owner. Together with name forms the scoped ID @scope/name.
name text Agent name within the scope. Kebab-case, 2–64 chars.
owner_user_id text (fk → users.id) The user who first published this agent. One owner per agent at MVP.
latest_version_id text (fk → versions.id, nullable) Pointer to the current latest version. NULL before the first version is published. Advances on publish unless the new version is a pre-release.
category text Category slug from the curated categories table.
description text Long-form description (copied from the latest published manifest for indexing convenience).
tagline text One-line tagline (same).
display_name text Human-readable name (same).
homepage text? Optional project URL.
repository text? Optional source repo URL.
license text SPDX identifier.
download_count integer Cached sum of download_count across all versions. Refreshed on tarball download. Best-effort, not transactional.
avg_rating real Cached average star rating (1.0–5.0). NULL if no reviews. Refreshed on review insert/update/delete.
review_count integer Cached count of reviews. Refreshed alongside avg_rating.
created_at timestamp First publish time.
updated_at timestamp Last publish or metadata refresh.

Constraints

  • UNIQUE (scope, name)
  • Indexes on scope, name, category, owner_user_id, updated_at
  • LIKE-friendly indexes on display_name and tagline for search
  • Index on download_count (for sort=downloads)
  • Index on avg_rating (for sort=rating)

The denormalized description, tagline, display_name fields are snapshots from the most recently published version's manifest. They are refreshed on publish and on metadata refresh. This avoids a join to versions for every search query.

The denormalized download_count, avg_rating, and review_count fields are aggregates cached on the agent row for search sorting and listing performance. download_count is refreshed on every tarball download (best-effort). avg_rating and review_count are refreshed on every review insert, update, or delete.

versions

Immutable once written. Every published version of every agent is a row here. Yanking flips yanked_at but never deletes or mutates other fields.

Field Type Description
id text (UUID) Primary key.
agent_id text (fk → agents.id) The agent this version belongs to.
version text SemVer string. Must be strictly greater than any existing version row for the same agent_id.
channel text enum community | official | beta. Enforced in application code; D1 has no native enum.
manifest text (JSON) The full agent.json as uploaded. Stored for audit, rendering, and re-serving via the API.
tarball_r2_key text R2 object key for the tarball. Convention: tarballs/<scope>/<name>/<version>.tgz.
tarball_sha256 text SHA-256 of the tarball contents. Used by the CLI to verify downloads.
tarball_size_bytes integer Total tarball size. Capped at 100 MB.
download_count integer Incremented on every tarball download. Best-effort, not transactional.
uploaded_by_user_id text (fk → users.id) The user who published this version.
uploaded_at timestamp Server wall clock at publish.
yanked_at timestamp? If set, this version is yanked. Excluded from resolution but still downloadable by pinned version.
yanked_by_user_id text? (fk → users.id) Who yanked it.
yanked_reason text? Free-text reason supplied on yank.

Constraints

  • UNIQUE (agent_id, version)
  • Index on (agent_id, uploaded_at DESC) for version history queries
  • Index on yanked_at (for filtering resolvable versions)

agent_tags

Many-to-many between agents and free-form tags. Keeps tag filtering cheap without relying on JSON column queries.

Field Type Description
agent_id text (fk → agents.id)
tag text Lowercased, kebab-case.

Constraints

  • PRIMARY KEY (agent_id, tag)
  • Index on tag (for "all agents with tag X" queries)

Tags are refreshed on publish from the manifest's tags array: the previous set is replaced atomically.

version_assets

One row per asset (icon, screenshot, etc.) within a published version. Assets are uploaded separately as part of the publish flow, not extracted from the tarball.

Field Type Description
id text (UUID) Primary key.
version_id text (fk → versions.id)
kind text enum icon | screenshot | other.
r2_key text R2 object key. Convention: assets/<scope>/<name>/<version>/<sanitized-path>.
content_type text Sniffed MIME type.
size_bytes integer
sha256 text
ordering integer Display order (for screenshots).

Constraints

  • Index on (version_id, kind) for detail-page queries

reports

Moderation queue. Filled by the public POST /v1/reports endpoint, drained by maintainers through GET /v1/reports + POST /v1/reports/:id/resolve.

Field Type Description
id text (UUID) Primary key.
agent_id text (fk → agents.id)
version_id text? (fk → versions.id) Nullable — report may target the whole agent.
reporter_user_id text? (fk → users.id) Nullable — anonymous reports are allowed.
reporter_ip_hash text SHA-256 of (IP + salt). Used for rate limiting and abuse detection, not for identification.
reason text enum malicious | trademark | spam | inappropriate | other.
details text Free-text. Capped.
status text enum open | resolved | dismissed.
resolved_by_user_id text? (fk → users.id)
resolved_at timestamp?
resolution_notes text?
created_at timestamp

Constraints

  • Index on (status, created_at) for "open reports, oldest first"
  • Index on agent_id
  • Index on reporter_ip_hash (for rate limiting)

reviews

One row per user per agent. An authenticated user may leave one review for an agent they do not own. Reviews carry a 1–5 star rating and an optional text body.

Field Type Description
id text (UUID) Primary key.
agent_id text (fk → agents.id) The agent being reviewed.
reviewer_user_id text (fk → users.id) The user who wrote the review.
rating integer Star rating, 1–5. Enforced in application code.
title text? Optional short summary. Max 120 chars.
body text? Optional long-form review text. Max 2000 chars.
created_at timestamp When the review was posted.
updated_at timestamp Last edit time. Equals created_at if never edited.

Constraints

  • UNIQUE (agent_id, reviewer_user_id) — one review per user per agent
  • Index on agent_id (for listing reviews on an agent detail page)
  • Index on reviewer_user_id (for user profile "my reviews" queries)
  • Index on (agent_id, created_at DESC) (for paginated review feeds)

On insert, update, or delete, the API recalculates agents.avg_rating and agents.review_count from the surviving rows. This is a simple AVG(rating) / COUNT(*) over the reviews table filtered by agent_id. At MVP scale this is fast enough inline; at scale it moves to a background job.

Authors cannot review their own agents — the backend rejects with 403 if reviewer_user_id === agents.owner_user_id.

profiles

One row per user. Created on first GitHub sign-in, hydrated from the GitHub OAuth payload. Users may update their profile from the web frontend.

Field Type Description
user_id text (fk → users.id) Primary key. One-to-one with users.
bio text? Short bio. Max 280 chars.
website text? URL. Validated format.
location text? Free-text location. Max 100 chars.
github_login text GitHub username, lowercased. Also the user's scope.
avatar_url text? GitHub avatar URL, copied from the OAuth payload.
display_name text? Human-readable name from GitHub, editable.
created_at timestamp Profile creation time (mirrors users.createdAt).
updated_at timestamp Last profile edit.

Constraints

  • PRIMARY KEY (user_id)
  • UNIQUE (github_login)

On first sign-in, Better Auth creates the users row. A Clawstore post-sign-in hook creates the profiles row, seeding github_login, avatar_url, and display_name from the GitHub OAuth response. If the user later edits their profile on useclawstore.com, only the profiles row is updated — the users row is never touched by Clawstore code.

categories

Seeded list, curated by maintainers. Not a moving part in day-to-day operation.

Field Type Description
id text Primary key. Slug like "health-fitness".
name text Display name.
icon text Emoji or icon identifier.
sort_order integer Display order in the UI.

Seeded via a migration at bootstrap; the API exposes read-only access under GET /v1/categories.

Immutability rules

Immutability is enforced in application code, not in the database. SQLite has no constraint mechanism that cleanly expresses "rows in this table may be inserted and a specific column may be updated but nothing else." The rules that matter:

  • Publishing a version inserts a versions row and only INSERTs. No code path in Clawstore updates the core version fields (version, manifest, tarball_r2_key, tarball_sha256, tarball_size_bytes, uploaded_at, uploaded_by_user_id) after insert.
  • Yanking updates only yanked_at, yanked_by_user_id, yanked_reason on the version row.
  • download_count updates are best-effort and tolerate races. They do not participate in any transactional guarantee. The cached download_count on agents is refreshed alongside version-level counters.
  • avg_rating and review_count on agents are recalculated from reviews rows on every review write. These are derived aggregates, not source-of-truth — the reviews table is authoritative.
  • Version rows are never deleted by any code path. A yanked version is still downloadable by operators with a pinned version — this is intentional, matches npm/crates.io semantics, and keeps reproducible installs intact.

To keep accidents unlikely, the Drizzle client wrapper exposes insertVersion() and yankVersion() as the only two mutation paths for the versions table. Direct .update() or .delete() calls on versions are considered bugs and should not pass review.

Channels

Channels are a text column on versions with application-level validation.

Channel Who can publish Listed by default? latest pointer?
community Any authenticated user Yes Yes
official Authenticated user with the official role (curated allowlist) Yes Yes
beta Any authenticated user — requires a pre-release suffix in the version string Optional filter No — latest never advances to a beta

An agent may have versions in multiple channels. The latest pointer on agents always refers to the most recent stable (non-beta) version, regardless of channel.

The private channel is deferred. When added, it will sit on the versions row with a visibility ACL on the agents row.

Search strategy

MVP search is deliberately boring: Drizzle query builder LIKE queries over denormalized text columns on agents, plus indexed tag joins through agent_tags.

Why LIKE, not FTS5

  • D1 supports SQLite FTS5 virtual tables. They would give better ranking and relevance.
  • FTS5 is a SQLite feature. Postgres FTS is structurally different. Writing against FTS5 means the D1-to-Postgres migration has to port every FTS query by hand, not by swapping a dialect import.
  • At MVP scale (10–1000 agents), LIKE on indexed columns is fast enough that users will not notice the difference.
  • Upgrading to Postgres FTS + trigram at migration time is additive — it improves search quality — and costs one focused refactor instead of a bespoke port.

Which columns, which indexes

Search queries are LIKE '%q%' against:

  • agents.display_name
  • agents.tagline
  • agents.scope
  • agents.name

Filters join:

  • agent_tags.tag (exact match)
  • agents.category (exact match)
  • agents.channel of the latest_version (exact match, by joining versions)

Indexes on every LIKE-ed column. At 10k agents this still returns results in single-digit milliseconds on D1.

D1 → Postgres migration conventions

These are rules the schema file must follow to keep the migration cost low. Violations are bugs; reviewers should catch them.

  1. All DB access flows through Drizzle's query builder. No raw SQL unless absolutely necessary. When raw SQL is necessary, it must be flagged in a code comment and re-verified during migration.
  2. No SQLite-specific types. Use Drizzle's text, integer, timestamp, json helpers. They normalize across SQLite and Postgres.
  3. No D1 FTS5 virtual tables. See above.
  4. No SQLite-specific date/time functions in application code. Use Date.now() in TypeScript and pass numeric timestamps through Drizzle's timestamp helper.
  5. JSON columns via Drizzle's json helper only. No string-packed JSON with LIKE queries.
  6. Foreign keys declared on every relation. D1 requires PRAGMA foreign_keys = ON; Drizzle handles this. FKs both document intent and validate test data.
  7. Primary keys are TEXT UUIDs, not integers. Portable, human-greppable in logs, collision-proof across environments.
  8. Better Auth's provider string flips from "sqlite" to "pg" at migration time — one line in the auth config. See the better-auth-cloudflare integration in apps/api/src/auth.ts.

At 100-agent scale the migration is a half-day of work: re-point Drizzle imports from drizzle-orm/sqlite-core to drizzle-orm/pg-core, regenerate migrations, dump D1 via wrangler d1 export, massage into Postgres COPY statements, import, swap the binding in sst.config.ts. At 10k-agent scale it is roughly a week, mostly data validation.

Cross-references