Skip to content

perf: add missing database indexes for high-frequency query columns #1284

@lancy

Description

@lancy

Summary

Research has identified several database tables missing indexes on frequently queried columns, leading to full table scans instead of index lookups.

Problem

Critical Missing Indexes

1. agent_runs table (Most severe)

Column Query Frequency Current Status
userId 35+ query locations, every listing operation ❌ No index
status Every cron cleanup cycle ❌ No index
createdAt Every paginated listing (ORDER BY) ❌ No index
scheduleId Schedule run history lookups ❌ No index

Impact: Full table scans on a table expected to grow to 100,000+ rows.

Key query patterns affected:

  • GET /v1/runs - Lists user's runs with pagination
  • /api/cron/cleanup-sandboxes - Finds running sandboxes by status
  • Schedule service - Finds runs by scheduleId

2. agent_sessions table

Column Query Frequency Current Status
userId Every session list operation ❌ No index
Composite (userId, agentComposeId, artifactName) findOrCreate() critical path ❌ No index

3. cli_tokens table

Column Query Frequency Current Status
expiresAt Every authenticated API request ❌ No index

Note: token has a unique index, but auth queries filter by both token AND expiresAt > now().

Evidence

Query Locations

agent_runs.userId queries (production code):

  • /app/api/webhooks/agent/*/route.ts - Multiple webhook handlers
  • /app/v1/runs/route.ts - Run listing endpoint
  • /app/api/storages/*/route.ts - Storage operations

agent_runs.status queries:

  • /app/api/cron/cleanup-sandboxes/route.ts:72 - eq(agentRuns.status, "running")

cli_tokens auth queries (critical path):

  • /src/lib/auth/get-user-id.ts:40-41
  • /src/lib/auth/runner-auth.ts:111-112

Comparison with Well-Indexed Tables

Tables in this project that do have proper indexes:

  • agent_schedules: 3 indexes including partial index for cron polling
  • runner_job_queue: Partial index for unclaimed job polling
  • storages: Composite unique index for user lookups

Proposed Indexes

-- agent_runs indexes
CREATE INDEX idx_agent_runs_user_id ON agent_runs(user_id);
CREATE INDEX idx_agent_runs_user_created ON agent_runs(user_id, created_at DESC);
CREATE INDEX idx_agent_runs_status_heartbeat ON agent_runs(status, last_heartbeat_at) 
  WHERE status = 'running';
CREATE INDEX idx_agent_runs_schedule_id ON agent_runs(schedule_id) 
  WHERE schedule_id IS NOT NULL;

-- agent_sessions indexes
CREATE INDEX idx_agent_sessions_user_id ON agent_sessions(user_id);
CREATE INDEX idx_agent_sessions_user_compose_artifact ON agent_sessions(user_id, agent_compose_id, artifact_name);

-- cli_tokens index
CREATE INDEX idx_cli_tokens_token_expires ON cli_tokens(token, expires_at);

Performance Impact

Query Without Index With Index
List user's runs (100 in 100K) ~100K row scan ~100 row index lookup
Find running sandboxes (~100 in 100K) ~100K row scan ~100 row partial index scan
User session list (10 in 10K) ~10K row scan ~10 row index lookup
Token auth check Token index + filter Composite index lookup

Research Document

Full analysis available at: /tmp/deep-dive/missing-indexes/research.md

Acceptance Criteria

  • Add migration for agent_runs indexes
  • Add migration for agent_sessions indexes
  • Add migration for cli_tokens composite index
  • Update Drizzle schema files with index definitions
  • Verify indexes are created in development database
  • Run existing tests to ensure no regressions

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions