All data in mcp-awareness is stored in a single entries table using a common envelope pattern. Every record — whether it's a system status report, an alert, a piece of knowledge, or a preference — shares the same columns. The type field determines the semantics, and the data column holds type-specific fields.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
TEXT | No | Primary key. Owner identifier (e.g., "cmeans", "alice"). |
email |
TEXT | Yes | Email address as provided by the user (for display, contact). |
canonical_email |
TEXT | Yes | Normalized email for uniqueness checks. UNIQUE constraint. See normalization rules below. |
email_verified |
BOOLEAN | No | Whether the email has been verified. Default: FALSE. |
phone |
TEXT | Yes | Phone number in E.164 format (e.g., "+14155551234"). NOT unique — shared lines exist. |
phone_verified |
BOOLEAN | No | Whether the phone has been verified. Default: FALSE. |
password_hash |
TEXT | Yes | argon2id hash. Nullable — OAuth-only users skip this. |
display_name |
TEXT | Yes | Human-readable name. |
timezone |
TEXT | Yes | IANA timezone (e.g., "America/Chicago"). Default: "UTC". Used for notification scheduling. |
preferences |
JSONB | No | Extensible user settings (notification prefs, etc.). Default: {}. |
oauth_subject |
TEXT | Yes | OAuth sub claim from the identity provider. Used for provider-agnostic user lookup. |
oauth_issuer |
TEXT | Yes | OAuth issuer URL (e.g., https://your-domain.authkit.app). Identifies which provider issued the token. |
created |
TIMESTAMPTZ | No | When the user was created. Default: now(). |
updated |
TIMESTAMPTZ | Yes | Last update timestamp. NULL until first update. |
deleted |
TIMESTAMPTZ | Yes | Soft deletion timestamp. NULL means active. |
| Index | Columns | Type | Purpose |
|---|---|---|---|
ix_users_oauth_identity |
oauth_issuer, oauth_subject |
Unique (partial) | One OAuth identity per provider. WHERE oauth_issuer IS NOT NULL. |
ix_users_oauth_subject |
oauth_subject |
B-tree (partial) | Fast lookup on every authenticated request. WHERE oauth_subject IS NOT NULL. |
The canonical_email column is computed on write to prevent trivial multi-account abuse:
- Lowercase the entire address
- Strip
+tagfrom the local part (e.g.,user+test@gmail.com→user@gmail.com) — universal, de facto standard - Strip dots from the local part for
gmail.com/googlemail.comonly (e.g.,u.s.e.r@gmail.com→user@gmail.com) - Normalize
googlemail.com→gmail.com
The UNIQUE constraint is on canonical_email, not email. Users see and use their original email; normalization is invisible.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
TEXT | No | Primary key. UUID v4, generated via uuid.uuid4(). |
owner_id |
TEXT | No | Owner identifier. References the user who owns this entry. All queries are scoped by owner_id. |
type |
TEXT | No | Entry type. One of: status, alert, pattern, suppression, context, preference, note, intention, schema, record. |
source |
TEXT | No | Origin identifier. Describes the subject, not the owner (e.g., "personal", "synology-nas", "mcp-awareness-project"). |
created |
TIMESTAMPTZ | No | UTC timestamp. Set once when the entry is first created. |
updated |
TIMESTAMPTZ | Yes | UTC timestamp. Updated on every upsert or update_entry call. NULL until first update. |
expires |
TIMESTAMPTZ | Yes | When set, the entry is eligible for cleanup after this time. NULL means permanent (until explicitly deleted). |
deleted |
TIMESTAMPTZ | Yes | Timestamp of soft deletion. NULL means active. Non-null means trashed — recoverable for 30 days, then auto-purged. |
tags |
JSONB | No | Array of strings (e.g., ["infra", "nas", "docker"]). Used for filtering, suppression matching, and knowledge retrieval. Default: []. |
data |
JSONB | No | Object with type-specific fields. Structure depends on type — see below. Default: {}. |
logical_key |
TEXT | Yes | Optional key for upsert deduplication. Unique within a source. |
| Index | Columns | Type | Purpose |
|---|---|---|---|
idx_entries_owner |
owner_id |
B-tree | Filter by owner |
idx_entries_owner_type |
owner_id, type |
B-tree | Filter by owner + entry type |
idx_entries_owner_source |
owner_id, source |
B-tree | Filter by owner + source |
idx_entries_owner_type_source |
owner_id, type, source |
B-tree | Combined filter (e.g., all alerts for an owner's source) |
idx_entries_tags_gin |
tags |
GIN | Fast tag containment queries |
idx_entries_source_logical_key |
owner_id, source, logical_key |
Unique (partial) | Upsert deduplication (WHERE logical_key IS NOT NULL AND deleted IS NULL) |
Written by edge processes via report_status. One active entry per source (upserted). If ttl_sec elapses without a refresh, the source is marked stale in the briefing.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
metrics |
object | Yes | Nested metric groups. Structure is source-defined (e.g., {"cpu": {"usage_pct": 34}, "memory": {"usage_pct": 71}}). |
inventory |
object | No | Current state of managed resources (e.g., {"docker": {"running": ["plex"], "stopped": []}}). |
ttl_sec |
integer | Yes | Time-to-live in seconds. If no update arrives within this window, the source is considered stale. Default: 120. |
Written by edge processes via report_alert. Keyed by source + alert_id (upserted). Set resolved: true to mark an alert as resolved.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
alert_id |
string | Yes | Unique identifier within the source (e.g., "struct-pihole-stopped"). Used for upsert matching. |
level |
string | Yes | Severity: "warning" or "critical". Critical alerts break through warning-level suppressions. |
alert_type |
string | Yes | Detection method: "threshold", "structural", or "baseline". |
message |
string | Yes | Human-readable alert description. Used in briefing summaries and suppression matching. |
resolved |
boolean | Yes | false = active, true = resolved. Resolved alerts are excluded from the briefing. |
details |
object | No | Additional structured context (e.g., affected resources, thresholds). |
diagnostics |
object | No | Evidence captured at detection time. Should be recorded when the alert fires — the evidence may be transient. |
Written by agents via learn_pattern. Use ONLY for knowledge with conditions and/or effects relevant to the alert collator. For general-purpose knowledge, use note instead.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
description |
string | Yes | Human-readable description of the operational pattern. |
conditions |
object | No | When this pattern applies. Temporal conditions: {"day_of_week": "friday"}, {"hour_range": [2, 6]}. Default: {}. |
effect |
string | No | What this pattern implies for alerting (e.g., "suppress qbittorrent_stopped"). Used by the collator for pattern-based suppression. Default: "". |
learned_from |
string | No | Platform that recorded this (e.g., "claude-code", "claude-ai"). Default: "conversation". |
Written by agents via remember. Permanent unless explicitly deleted. The default choice for storing knowledge — personal facts, project notes, skill backups, config snapshots, anything that doesn't need conditions/effects for alert matching.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
description |
string | Yes | Short summary of what this note contains. |
content |
string | No | Optional payload — the actual data. Can be plain text, JSON, markdown, etc. |
content_type |
string | No | MIME type of the content (e.g., "text/plain", "application/json", "text/markdown"). Default: "text/plain". |
learned_from |
string | No | Platform that recorded this. Default: "conversation". |
changelog |
array | No | Change history. Populated automatically by update_entry. Each element: {"updated": "<timestamp>", "changed": {"<field>": "<old_value>"}}. |
Written by agents via remind. Intentions have a lifecycle: they start pending, fire when conditions are met (currently time-based via deliver_at), and complete when the user acts on them. The collator evaluates pending intentions during briefing generation.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
goal |
string | Yes | What outcome is desired (e.g., "pick up milk", "tell Mom about insurance"). |
state |
string | Yes | Lifecycle state: pending, active, fired, completed, snoozed, cancelled. |
deliver_at |
string | No | ISO 8601 timestamp — when to surface this intention. Required for time-based triggers. |
constraints |
string | No | Preferences or requirements (e.g., "organic, budget-conscious"). |
urgency |
string | No | "low", "normal", or "high". Default: "normal". |
recurrence |
string | No | Reserved for future use. Currently only one-shot (null) is supported. |
state_reason |
string | No | Explanation for the current state (e.g., "completed at Mariano's", "not today"). |
learned_from |
string | No | Platform that created this. Default: "conversation". |
changelog |
array | No | State transition history. Each element: {"updated": "<timestamp>", "changed": {"state": "<old_state>"}}. |
Written by agents via suppress_alert. Time-limited — always has an expires timestamp. Suppressions filter alerts out of the briefing. Critical alerts can break through warning-level suppressions via escalation override.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
metric |
string | No | Specific metric to suppress (e.g., "cpu_pct"). null means match by tags/source. |
suppress_level |
string | Yes | Maximum alert level to suppress: "warning" or "critical". Default: "warning". |
escalation_override |
boolean | Yes | If true, critical alerts break through even when the suppression matches. Default: true. |
reason |
string | No | Why the suppression was created. Default: "". |
Written by agents via add_context. Always has an expires timestamp (default: 30 days). Use for events, temporary situations, or facts that lose relevance over time.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
description |
string | Yes | Human-readable description of the context. |
Written by agents via set_preference. Keyed by key + scope (upserted). Portable across agent platforms.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
key |
string | Yes | Preference name (e.g., "alert_verbosity", "check_frequency"). |
value |
string | Yes | Preference value (e.g., "one_sentence_warnings", "first_turn_only"). |
scope |
string | Yes | Scope of the preference. Default: "global". |
Written by operators or agents via register_schema. Immutable after registration. Schema body lives in data.schema; family + version in data.family + data.version; logical_key derived as {family}:{version}. Used by record entries for typed validation.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
family |
string | Yes | Schema family identifier (e.g., schema:edge-manifest, schema:config). Used as the reference key. |
version |
string | Yes | Schema version (user-chosen semantic or sequential, e.g., "1.0.0", "1"). |
schema |
object | Yes | JSON Schema Draft 2020-12 body. Defines the validation rules and structure. |
description |
string | No | Human-readable description of what this schema validates. |
learned_from |
string | No | Platform that registered the schema (e.g., "claude-code", "operator"). Default: "conversation". |
Written by agents via create_record. Content in data.content; pinned schema reference in data.schema_ref + data.schema_version (exact version, no "latest" aliasing). Re-validated on content update via update_entry.
data fields:
| Field | Type | Required | Description |
|---|---|---|---|
schema_ref |
string | Yes | Target schema family (e.g., schema:edge-manifest). Used to look up the schema definition. |
schema_version |
string | Yes | Target schema version (exact pin, e.g., "1.0.0"). Pinned at write time; determines which schema is used for validation on updates. |
content |
any | Yes | Validated payload — any JSON value (object, array, string, number, boolean, null). Must conform to the pinned schema. |
description |
string | No | Human-readable description of what this record represents. |
learned_from |
string | No | Platform that created the record (e.g., "claude-code", edge provider name). Default: "conversation". |
- Upsert behavior:
statusentries are upserted bysource.alertentries bysource+alert_id.preferenceentries bykey+scope. Other types always insert new rows. - Soft delete:
delete_entrysets thedeletedtimestamp. Entry remains in the database for 30 days, recoverable viarestore_entry. Bulk deletes requireconfirm=True(dry-run by default). - Auto-purge: Expired entries (
expires < now) and old soft-deleted entries are cleaned up by_cleanup_expired, which runs on a background thread triggered by write operations, debounced to at most every 10 seconds. Cleanup never blocks the request. Note: auto-purge performs a hardDELETE. Expired entries bypass the trash entirely. - Staleness: Status entries with
ttl_secare marked stale in the briefing if no update arrives within the TTL window. The entry itself is not deleted. - Change tracking:
update_entryappends previous field values to thechangelogarray indata. Useget_knowledge(include_history="true")to see changes, orinclude_history="only"to find entries that have been modified. - Hard deletes: The API only performs soft deletes. Manual SQL
DELETEstatements bypass the trash — that data is gone permanently. Back up regularly. - Read/action cleanup: The
readsandactionstables useON DELETE CASCADEonentry_id. This means read and action records are automatically removed when an entry is hard deleted (auto-purge or manual SQL). Soft delete (delete_entry) does not cascade — reads and actions persist for trashed entries until the 30-day purge.
Auto-populated when entries are accessed via get_knowledge and get_alerts. Fire-and-forget — read log failures never block tool responses.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
SERIAL | No | Auto-incrementing primary key. |
owner_id |
TEXT | No | Owner identifier. Denormalized from the entry for direct query scoping and RLS. |
entry_id |
TEXT | No | References entries(id) with ON DELETE CASCADE. |
timestamp |
TIMESTAMPTZ | No | When the read occurred. Default: now(). |
platform |
TEXT | Yes | Which platform performed the read (e.g., "claude-code"). |
tool_used |
TEXT | Yes | Which tool triggered the read (e.g., "get_knowledge"). |
| Index | Columns | Type | Purpose |
|---|---|---|---|
idx_reads_owner |
owner_id |
B-tree | Filter by owner |
idx_reads_entry |
entry_id |
B-tree | Look up reads for a specific entry |
idx_reads_timestamp |
timestamp |
B-tree | Time-range queries |
Agent-reported records of concrete actions taken because of an entry. Permanent audit trail.
| Column | Type | Nullable | Description |
|---|---|---|---|
id |
SERIAL | No | Auto-incrementing primary key. |
owner_id |
TEXT | No | Owner identifier. Denormalized from the entry for direct query scoping and RLS. |
entry_id |
TEXT | No | References entries(id) with ON DELETE CASCADE. |
timestamp |
TIMESTAMPTZ | No | When the action was recorded. Default: now(). |
platform |
TEXT | Yes | Which platform reported the action (e.g., "claude-code"). |
action |
TEXT | No | What was done (e.g., "created GitHub issue #42"). |
detail |
TEXT | Yes | Optional structured reference (PR URL, issue number, etc.). |
tags |
JSONB | No | Tags for filtered queries. Default: copied from referenced entry. |
| Index | Columns | Type | Purpose |
|---|---|---|---|
idx_actions_owner |
owner_id |
B-tree | Filter by owner |
idx_actions_entry |
entry_id |
B-tree | Look up actions for a specific entry |
idx_actions_timestamp |
timestamp |
B-tree | Time-range queries |
idx_actions_tags_gin |
tags |
GIN | Fast tag containment queries |
Stores vector embeddings for semantic search. One embedding per entry per model.
| Column | Type | Nullable | Default | Description |
|---|---|---|---|---|
id |
SERIAL |
NO | auto | Row ID |
owner_id |
TEXT |
NO | — | Owner identifier. Denormalized for direct query scoping and RLS. |
entry_id |
TEXT |
NO | — | FK → entries.id (ON DELETE CASCADE) |
model |
TEXT |
NO | — | Embedding model name (e.g., granite-embedding:278m) |
dimensions |
INTEGER |
NO | — | Vector dimension count (e.g., 768) |
text_hash |
TEXT |
NO | — | SHA-256 of embedded text (staleness detection) |
embedding |
VECTOR(N) |
NO | — | pgvector embedding (N = AWARENESS_EMBEDDING_DIMENSIONS, default 768) |
created |
TIMESTAMPTZ |
NO | now() |
When this embedding was generated |
Constraints: UNIQUE (entry_id, model) — one embedding per entry per model, upsert on conflict.
| Index | Columns | Type | Purpose |
|---|---|---|---|
idx_embeddings_owner |
owner_id |
B-tree | Filter by owner |
idx_embeddings_entry |
entry_id |
B-tree | Look up embeddings for a specific entry |
idx_embeddings_vector_hnsw |
embedding |
HNSW (vector_cosine_ops) |
Fast approximate nearest neighbor search |
- Embeddings are generated fire-and-forget on write (never blocks the response)
- Suppression entries are not embedded (short-lived, not worth searching)
- The
text_hashcolumn enables detection of stale embeddings after entry updates. The hash is a SHA-256 digest of the string produced bycompose_embedding_text()inembeddings.py. On each write, a fresh hash is computed and compared to the stored value; a mismatch marks the embedding as stale, and the background worker re-embeds the entry on its next cycle. Because the hash is derived from the composed text, any change tocompose_embedding_text()(new fields, reordered fields, different separators) invalidates all stored hashes and triggers a one-time mass re-embedding. This is by design — embeddings always reflect the current composition logic ON DELETE CASCADEensures embeddings are cleaned up when entries are deleted- Requires
AWARENESS_EMBEDDING_PROVIDER=ollamato activate (optional) - Dimension constraint: The
VECTORcolumn dimension is configured viaAWARENESS_EMBEDDING_DIMENSIONS(default: 768, matchinggranite-embedding:278m). The inline DDL uses this value at table creation time. The initial Alembic migration hardcodes 768 — existing deployments that need a different dimension require a new migration. The dimension must match the embedding model's output size.
Stores active MCP sessions for cross-node persistence and recovery. Feature-gated by AWARENESS_SESSION_DATABASE_URL. When the session database is unreachable, the server degrades gracefully to in-memory session state.
| Column | Type | Nullable | Description |
|---|---|---|---|
session_id |
TEXT | No | Primary key. MCP session identifier (uuid4 hex). |
owner_id |
TEXT | No | Owner from JWT sub claim (or default owner for unauthenticated connections). |
node |
TEXT | Yes | Node name that created the session (AWARENESS_SESSION_NODE_NAME, defaults to hostname). |
protocol_version |
TEXT | Yes | MCP protocol version negotiated during initialize. |
capabilities |
JSONB | No | Client capabilities from the initialize request. Default: {}. |
client_info |
JSONB | No | Client info (name, version) from the initialize request. Default: {}. |
created_at |
TIMESTAMPTZ | No | When the session was first registered. Default: now(). |
last_seen |
TIMESTAMPTZ | No | Last request time. Debounced — updated at most once per 30 seconds. Default: now(). |
expires_at |
TIMESTAMPTZ | No | Sliding expiry timestamp. Extended by AWARENESS_SESSION_TTL seconds on each touch. |
- Sessions are registered on MCP
initializeand touched on every subsequent request. - Touch is debounced to avoid write amplification on high-frequency clients.
- Expired sessions (
expires_at < now()) are cleaned up by the background_cleanup_expiredthread. - When a request arrives with an unknown session ID, the middleware checks
session_redirectsfor a cross-node recovery mapping before re-initializing.
Short-lived redirect table used during rolling deploys and cross-node session recovery. When a node re-initializes a session originally created on another node, it records a mapping from the old session ID to the new one.
| Column | Type | Nullable | Description |
|---|---|---|---|
old_session_id |
TEXT | No | Primary key. The session ID the client is presenting. |
new_session_id |
TEXT | No | The replacement session ID assigned by the recovering node. |
created_at |
TIMESTAMPTZ | No | When the redirect was created. Default: now(). |
expires_at |
TIMESTAMPTZ | No | Grace period expiry (default: 5 minutes after creation). Expired redirects are auto-purged. |
- Redirects are consumed by the ASGI middleware when a client presents a session ID not in
session_registry. - The client is transparently re-initialized with the new session ID; the redirect is recorded for audit purposes.
- Short expiry (5 min) limits storage overhead. Clients that reconnect after the grace period are re-initialized fresh.
Entries can cross-reference each other via an optional related_ids field in the data JSONB column:
{
"description": "Decided to go Postgres-only",
"related_ids": ["abc-123", "def-456"]
}This is a convention, not a schema constraint — no migration needed. The get_related tool traverses relationships bidirectionally:
- Forward: entries this entry references (IDs in
related_ids) - Reverse: entries that reference this entry (via
data->'related_ids' @> '["entry-id"]'::jsonb)
Use cases: decision → context, intention → action, note → note ("see also").
- Version: PostgreSQL 17 recommended (matches RDS support, pgvector 0.8.1)
- Driver: psycopg (sync) — matches the synchronous Store protocol. Auto-healing connection property with 30s health check debounce — dead connections reconnect transparently after Postgres restarts.
- Tags/data stored as: JSONB columns, queried via
jsonb_array_elements_text()and GIN-indexed@>containment - GIN index on
tagscolumn for fast tag containment queries - pgvector extension: Installed via
pgvector/pgvector:pg17Docker image. Used by theembeddingstable for HNSW vector similarity search. - WAL level:
wal_level=logicalconfigured for Debezium CDC readiness and logical replication - Replication slots:
max_replication_slots=4for future replication/CDC - Background cleanup: Daemon thread with its own psycopg connection, debounced (10s), with alive-check guard to prevent thread accumulation
- Connection string: Configured via
AWARENESS_DATABASE_URL. Accepts URL format (postgresql://user:pass@host:5432/db) or psycopg DSN format (host=X dbname=Y user=Z password=W). If using DSN format in an env file sourced by the shell, the value must be quoted to prevent space-splitting (e.g.,AWARENESS_DATABASE_URL="host=db dbname=awareness user=u password=p") - Docker image:
pgvector/pgvector:pg17(PostgreSQL 17 with pgvector pre-installed) - Schema migrations: Managed by Alembic (raw SQL, no ORM). Migration files in
alembic/versions/. Runmcp-awareness-migrateoralembic upgrade head. Version tracked inalembic_versiontable.
The PostgreSQL backend is designed for a clean migration path to AWS RDS:
- All extensions used are RDS-compatible: pgvector, pg_trgm, btree_gin, pg_cron
- TimescaleDB intentionally avoided (not available on RDS due to license restrictions)
- Logical replication supported via
wal_level=logical(RDS parameter group setting) - Migration:
pg_dump/pg_restorewithCREATE EXTENSIONstatements - No unlogged tables, no large objects — replication-safe by design
Part of the Awareness ecosystem. © 2026 Chris Means