diff --git a/.gitignore b/.gitignore index a4bac46..5da5599 100644 --- a/.gitignore +++ b/.gitignore @@ -13,3 +13,6 @@ pip-wheel-metadata/ dist/ build/ dev_docs/ +agents.md +.claude +CLAUDE.md diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..e70311a --- /dev/null +++ b/LICENSE @@ -0,0 +1,201 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by the Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding any notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. Please also get an approval + from the project maintainers before applying the license to your + contributions. + + Copyright 2025-2026 The Sentinel Contributors + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md index 57813ec..cc77599 100644 --- a/README.md +++ b/README.md @@ -1,71 +1,90 @@ # Sentinel -Sentinel is a multilingual moderation API for election-risk and civic discourse safety. -It is designed for products that need deterministic moderation decisions with audit evidence, especially in code-switched East African contexts. +Open-source multilingual moderation API built to protect Kenya's 2027 general election from ethnic incitement and election disinformation. -## Who this is for +Sentinel handles code-switched text across English, Swahili, and Sheng. It returns deterministic moderation decisions (`ALLOW`, `REVIEW`, or `BLOCK`) with full audit evidence, so every action can be explained and appealed. -- Community forums -- News platforms -- Civil society reporting tools -- Fact-check and trust-and-safety teams +## Who is this for? -## What Sentinel returns +**Platform integrators** — You run a forum, news platform, or civic tech tool and need a moderation API. You send text, Sentinel returns a decision. Start with the [Integration Guide](docs/integration-guide.md). -For each text input, Sentinel returns: +**Self-host operators** — You want to deploy and manage a Sentinel instance for your organization. Start with the [Deployment Guide](docs/deployment.md). -- `action`: `ALLOW`, `REVIEW`, or `BLOCK` -- `labels` and `reason_codes` -- `evidence` used for the decision -- provenance fields (`model_version`, `lexicon_version`, `policy_version`) +Both audiences should begin with the [Quickstart](docs/quickstart.md). -## Quickstart +## What Sentinel returns -```bash -python -m venv .venv -source .venv/bin/activate -python -m pip install --upgrade pip -python -m pip install -e .[dev,ops] - -# optional ML extras -python -m pip install -e .[ml] - -docker compose up -d --build -export SENTINEL_API_KEY='replace-with-strong-key' -python scripts/apply_migrations.py --database-url postgresql://sentinel:sentinel@localhost:5432/sentinel -python scripts/sync_lexicon_seed.py --database-url postgresql://sentinel:sentinel@localhost:5432/sentinel --activate-if-none -uvicorn sentinel_api.main:app --host 0.0.0.0 --port 8000 +```jsonc +{ + "toxicity": 0.92, + "labels": ["INCITEMENT_VIOLENCE"], + "action": "BLOCK", + "reason_codes": ["R_INCITE_CALL_TO_HARM"], + "evidence": [ + { + "type": "lexicon", + "match": "kill", + "severity": 3, + "lang": "en" + } + ], + "language_spans": [ + {"start": 0, "end": 26, "lang": "en"} + ], + "model_version": "sentinel-multi-v2", + "lexicon_version": "hatelex-v2.1", + "pack_versions": {"en": "pack-en-0.1", "sw": "pack-sw-0.1", "sh": "pack-sh-0.1"}, + "policy_version": "policy-2026.11", + "latency_ms": 12 +} ``` -Test a request: +Every response includes the evidence that drove the decision, the versions of all artifacts involved, and the latency of the call. This is the audit trail for appeals and transparency reporting. + +## Key concepts + +**6 labels**: `ETHNIC_CONTEMPT`, `INCITEMENT_VIOLENCE`, `HARASSMENT_THREAT`, `DOGWHISTLE_WATCH`, `DISINFO_RISK`, `BENIGN_POLITICAL_SPEECH` + +**3 actions**: `ALLOW` (publish), `REVIEW` (hold for human moderator), `BLOCK` (reject) + +**3 deployment stages**: `SHADOW` (log-only, no enforcement) -> `ADVISORY` (blocks downgraded to review) -> `SUPERVISED` (full enforcement). Roll out safely with progressive stages. + +**5 electoral phases**: `PRE_CAMPAIGN` -> `CAMPAIGN` -> `SILENCE_PERIOD` -> `VOTING_DAY` -> `RESULTS_PERIOD`. Sensitivity thresholds tighten automatically as election day approaches. + +## Quickstart ```bash -curl -sS -X POST http://localhost:8000/v1/moderate \ - -H 'Content-Type: application/json' \ - -H "X-API-Key: ${SENTINEL_API_KEY}" \ - -d '{"text":"They should kill them now."}' +git clone https://github.com/Thelastpoet/sentinel.git && cd sentinel +pip install -e .[dev,ops] +docker compose up -d --build postgres redis +make apply-migrations && make seed-lexicon +export SENTINEL_API_KEY='your-key-here' && make run ``` -## Integration model +See the full [Quickstart guide](docs/quickstart.md) for detailed instructions. + +## Project maturity -Your backend calls Sentinel before publish: +Sentinel ships with a **7-term demonstration seed lexicon**. This is enough to validate the system works end-to-end, but production deployment requires building out your own lexicon with domain-expert annotation. -1. Send user text to `POST /v1/moderate` -2. Apply action: - - `ALLOW` -> publish - - `REVIEW` -> moderation queue - - `BLOCK` -> reject -3. Store decision metadata for audit and appeals +The multi-label classifier currently runs in **shadow mode** (observability only). Claim-likeness scoring is active but **REVIEW-only** (it cannot produce `BLOCK`). Deterministic lexicon matches remain the only direct path to `BLOCK`. ## Documentation -- [Docs index](docs/README.md) -- [Quickstart](docs/quickstart.md) -- [Integration Guide](docs/integration-guide.md) -- [Deployment Guide](docs/deployment.md) -- [API Reference](docs/api-reference.md) -- [Security Notes](docs/security.md) -- [FAQ](docs/faq.md) +| Document | Audience | Description | +|----------|----------|-------------| +| [Quickstart](docs/quickstart.md) | Both | Get running in 5 minutes | +| [Integration Guide](docs/integration-guide.md) | Integrators | Full request/response reference and enforcement patterns | +| [Deployment Guide](docs/deployment.md) | Operators | Infrastructure, configuration, and operations | +| [API Reference](docs/api-reference.md) | Both | All 13 endpoints documented | +| [Security](docs/security.md) | Operators | Authentication, authorization, and safety architecture | +| [FAQ](docs/faq.md) | Both | Common questions for integrators and operators | + +Machine-readable public moderation contract: [`contracts/api/openapi.yaml`](contracts/api/openapi.yaml) and [`contracts/schemas/`](contracts/schemas/) + +## Contributing + +Contributions are welcome. Please open an issue to discuss significant changes before submitting a pull request. ## License diff --git a/docs/README.md b/docs/README.md index fe8fb07..0206273 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,17 +1,44 @@ # Sentinel Documentation -Use this documentation if you want to evaluate, deploy, and integrate Sentinel in a real product. +Sentinel is an open-source multilingual moderation API for election safety. These docs cover integration, deployment, and operation. -## Start here +## Reading paths -1. [Quickstart](quickstart.md) -2. [Integration Guide](integration-guide.md) -3. [Deployment Guide](deployment.md) -4. [API Reference](api-reference.md) -5. [Security Notes](security.md) -6. [FAQ](faq.md) +### Platform integrators -## Contracts and templates +You want to call the Sentinel API from your application. -- API/schema contracts: `contracts/` -- Go-live template bundle: `templates/go-live/` +1. [Quickstart](quickstart.md) — Path A: send your first moderation request +2. [Integration Guide](integration-guide.md) — Authentication, request/response schemas, enforcement patterns, error handling +3. [API Reference](api-reference.md) — Public and moderation endpoints + +### Self-host operators + +You want to deploy and manage a Sentinel instance. + +1. [Quickstart](quickstart.md) — Path B: stand up a local instance +2. [Deployment Guide](deployment.md) — Infrastructure, configuration, migrations, lexicon lifecycle, electoral phases +3. [Security](security.md) — Authentication, authorization, safety architecture +4. [API Reference](api-reference.md) — Admin and internal endpoints + +## Document index + +| Document | Audience | Description | +|----------|----------|-------------| +| [Quickstart](quickstart.md) | Both | Two paths: integrator (4 steps) and operator (full local setup) | +| [Integration Guide](integration-guide.md) | Integrators | Complete request/response reference, enforcement mapping, rate limiting, errors | +| [Deployment Guide](deployment.md) | Operators | Architecture, env vars, Docker, migrations, lexicon, phases, stages, OAuth, monitoring | +| [API Reference](api-reference.md) | Both | All 13 endpoints: public, moderation, admin appeals, transparency, release proposals | +| [Security](security.md) | Operators | Auth, scopes, input validation, safety constraints, data handling | +| [FAQ](faq.md) | Both | Common questions split by audience | + +## Related resources + +| Path | Contents | +|------|----------| +| `contracts/api/openapi.yaml` | Machine-readable contract for `/health`, `/metrics`, and `/v1/moderate` | +| `contracts/schemas/` | JSON Schema definitions for request/response models | +| `templates/go-live/` | Go-live readiness gate template bundle | +| `config/policy/default.json` | Default policy configuration (thresholds, phases, hints) | +| `data/lexicon_seed.json` | 7-term demonstration seed lexicon | +| `migrations/` | Database migration files (0001-0012) | diff --git a/docs/api-reference.md b/docs/api-reference.md index d5618e8..3d72d34 100644 --- a/docs/api-reference.md +++ b/docs/api-reference.md @@ -1,33 +1,78 @@ # API Reference -## `GET /health` +Sentinel exposes 13 endpoints across four groups: public, moderation, admin, and internal. -Returns service status. +## Authentication summary -Example response: +| Endpoint group | Auth method | Header | +|---------------|-------------|--------| +| Public (`/health`, `/metrics`, `/metrics/prometheus`) | None | — | +| Moderation (`/v1/moderate`) | API key | `X-API-Key` | +| Admin (`/admin/*`) | OAuth bearer token | `Authorization: Bearer ` | +| Internal (`/internal/*`) | OAuth bearer token | `Authorization: Bearer ` | + +## Public endpoints + +### `GET /health` + +Returns API health status. + +**Response** `200 OK` ```json -{"status":"ok"} +{"status": "ok"} ``` -## `GET /metrics` +### `GET /metrics` + +Returns runtime counters in JSON. -Returns runtime counters and latency buckets. +**Response** `200 OK` -## `POST /v1/moderate` +```json +{ + "action_counts": {"ALLOW": 150, "REVIEW": 23, "BLOCK": 7}, + "http_status_counts": {"200": 180, "400": 2, "429": 1}, + "latency_ms_buckets": {"le_50ms": 100, "le_100ms": 50, "le_150ms": 20}, + "validation_error_count": 2 +} +``` -Moderates input text. +### `GET /metrics/prometheus` -### Request +Returns Prometheus exposition text (`text/plain`). + +## Moderation endpoint + +### `POST /v1/moderate` + +Primary moderation endpoint. + +**Authentication**: `X-API-Key` header + +**Request body** ```json { - "text": "They should kill them now.", - "request_id": "optional-client-id" + "text": "Content to moderate", + "context": { + "source": "forum-post", + "locale": "ke", + "channel": "politics" + }, + "request_id": "client-correlation-id" } ``` -### Response (shape) +| Field | Type | Required | Constraints | +|-------|------|----------|-------------| +| `text` | string | Yes | 1-5000 chars | +| `context.source` | string | No | max 100 | +| `context.locale` | string | No | max 20 | +| `context.channel` | string | No | max 50 | +| `request_id` | string | No | max 128 | + +**Response** `200 OK` ```json { @@ -35,14 +80,352 @@ Moderates input text. "labels": ["INCITEMENT_VIOLENCE"], "action": "BLOCK", "reason_codes": ["R_INCITE_CALL_TO_HARM"], - "evidence": [], - "language_spans": [], - "model_version": "...", - "lexicon_version": "...", - "pack_versions": {}, - "policy_version": "...", - "latency_ms": 42 + "evidence": [ + { + "type": "lexicon", + "match": "kill", + "severity": 3, + "lang": "en", + "match_id": null, + "similarity": null, + "span": null, + "confidence": null + } + ], + "language_spans": [{"start": 0, "end": 26, "lang": "en"}], + "model_version": "sentinel-multi-v2", + "lexicon_version": "hatelex-v2.1", + "pack_versions": {"en": "pack-en-0.1", "sw": "pack-sw-0.1", "sh": "pack-sh-0.1"}, + "policy_version": "policy-2026.11", + "latency_ms": 12 +} +``` + +| Field | Type | +|-------|------| +| `toxicity` | float (0..1) | +| `labels` | enum[] (`ETHNIC_CONTEMPT`, `INCITEMENT_VIOLENCE`, `HARASSMENT_THREAT`, `DOGWHISTLE_WATCH`, `DISINFO_RISK`, `BENIGN_POLITICAL_SPEECH`) | +| `action` | `ALLOW` \| `REVIEW` \| `BLOCK` | +| `reason_codes` | string[] (`R_[A-Z0-9_]+`) | +| `evidence` | `EvidenceItem[]` | +| `language_spans` | `LanguageSpan[]` | +| `model_version` | string | +| `lexicon_version` | string | +| `pack_versions` | object | +| `policy_version` | string | +| `latency_ms` | integer | + +**EvidenceItem fields** + +| Field | Type | Notes | +|-------|------|-------| +| `type` | `lexicon` \| `vector_match` \| `model_span` | required | +| `match` | string or null | optional | +| `severity` | int (1..3) or null | optional | +| `lang` | string or null | optional | +| `match_id` | string or null | optional | +| `similarity` | float (0..1) or null | vector matches | +| `span` | string or null | model-derived evidence | +| `confidence` | float (0..1) or null | model-derived evidence | + +**Moderation error responses** + +| Status | `error_code` | Meaning | +|--------|--------------|---------| +| 400 | `HTTP_400` | Invalid request payload | +| 401 | `HTTP_401` | Missing or invalid API key | +| 429 | `HTTP_429` | Rate limited | +| 500 | `HTTP_500` | Internal server error | +| 503 | `HTTP_503` | API key auth not configured on server | + +## Admin: Appeals + +### Appeal state machine + +```text +submitted -> triaged -> in_review -> resolved_upheld + -> resolved_reversed + -> resolved_modified + -> rejected_invalid +``` + +### `POST /admin/appeals` + +Create an appeal. + +**OAuth scope**: `admin:appeal:write` + +**Request body** + +```json +{ + "original_decision_id": "decision-uuid", + "request_id": "request-uuid", + "original_action": "BLOCK", + "original_reason_codes": ["R_INCITE_CALL_TO_HARM"], + "original_model_version": "sentinel-multi-v2", + "original_lexicon_version": "hatelex-v2.1", + "original_policy_version": "policy-2026.11", + "original_pack_versions": {"en": "pack-en-0.1"}, + "rationale": "User disputed the decision" +} +``` + +**Response** `200 OK`: `AdminAppealRecord` + +### `GET /admin/appeals` + +List appeals. + +**OAuth scope**: `admin:appeal:read` + +**Query params**: `status`, `request_id`, `limit` (1..200, default 50) + +**Response** `200 OK` + +```json +{ + "total_count": 1, + "items": [ + { + "id": 1, + "status": "submitted", + "request_id": "request-uuid", + "original_decision_id": "decision-uuid", + "original_action": "BLOCK", + "original_reason_codes": ["R_INCITE_CALL_TO_HARM"], + "original_model_version": "sentinel-multi-v2", + "original_lexicon_version": "hatelex-v2.1", + "original_policy_version": "policy-2026.11", + "original_pack_versions": {"en": "pack-en-0.1"}, + "submitted_by": "admin-dashboard", + "reviewer_actor": null, + "resolution_code": null, + "resolution_reason_codes": null, + "created_at": "2026-01-15T10:30:00Z", + "updated_at": "2026-01-15T10:30:00Z", + "resolved_at": null + } + ] +} +``` + +### `POST /admin/appeals/{appeal_id}/transition` + +Transition an appeal. + +**OAuth scope**: `admin:appeal:write` + +**Path param**: `appeal_id` (integer >= 1) + +**Request body** + +```json +{ + "to_status": "in_review", + "rationale": "Escalating", + "resolution_code": null, + "resolution_reason_codes": null +} +``` + +**Response** `200 OK`: updated `AdminAppealRecord` + +### `GET /admin/appeals/{appeal_id}/reconstruct` + +Get full reconstruction for one appeal. + +**OAuth scope**: `admin:appeal:read` + +**Path param**: `appeal_id` (integer >= 1) + +**Response** `200 OK` + +```json +{ + "appeal": {"id": 1, "status": "in_review"}, + "timeline": [{"id": 2, "appeal_id": 1, "from_status": "submitted", "to_status": "triaged", "actor": "admin-dashboard", "rationale": "valid", "created_at": "2026-01-15T11:00:00Z"}], + "artifact_versions": { + "model": "sentinel-multi-v2", + "lexicon": "hatelex-v2.1", + "policy": "policy-2026.11", + "pack": {"en": "pack-en-0.1"} + }, + "original_reason_codes": ["R_INCITE_CALL_TO_HARM"], + "resolution": { + "status": null, + "resolution_code": null, + "resolution_reason_codes": null, + "reviewer_actor": null, + "resolved_at": null + } +} +``` + +## Admin: Transparency + +### `GET /admin/transparency/reports/appeals` + +Aggregate appeals report. + +**OAuth scope**: `admin:transparency:read` + +**Query params**: `created_from`, `created_to` (ISO-8601 datetime) + +**Response** `200 OK` + +```json +{ + "generated_at": "2026-02-13T12:00:00Z", + "total_appeals": 42, + "open_appeals": 5, + "resolved_appeals": 37, + "backlog_over_72h": 2, + "reversal_rate": 0.15, + "mean_resolution_hours": 18.5, + "status_counts": { + "submitted": 3, + "triaged": 1, + "in_review": 1, + "resolved_upheld": 20, + "resolved_reversed": 5, + "resolved_modified": 7, + "rejected_invalid": 5 + }, + "resolution_counts": { + "resolved_upheld": 20, + "resolved_reversed": 5, + "resolved_modified": 7 + } +} +``` + +### `GET /admin/transparency/exports/appeals` + +Raw appeals export. + +**OAuth scope**: `admin:transparency:export` + +**Extra scope when `include_identifiers=true`**: `admin:transparency:identifiers` + +**Query params**: `created_from`, `created_to`, `include_identifiers` (default `false`), `limit` (1..5000, default `200`) + +**Response** `200 OK` + +```json +{ + "generated_at": "2026-02-13T12:00:00Z", + "include_identifiers": false, + "total_count": 1, + "records": [ + { + "appeal_id": 1, + "status": "resolved_upheld", + "original_action": "BLOCK", + "original_reason_codes": ["R_INCITE_CALL_TO_HARM"], + "resolution_status": "resolved_upheld", + "resolution_code": "decision_correct", + "resolution_reason_codes": ["R_INCITE_CALL_TO_HARM"], + "artifact_versions": { + "model": "sentinel-multi-v2", + "lexicon": "hatelex-v2.1", + "policy": "policy-2026.11", + "pack": {"en": "pack-en-0.1"} + }, + "request_id": null, + "original_decision_id": null, + "transition_count": 3, + "created_at": "2026-01-15T10:30:00Z", + "resolved_at": "2026-01-16T14:00:00Z" + } + ] +} +``` + +## Admin: Release proposals + +### `GET /admin/release-proposals/permissions` + +Returns actor identity and scopes. + +**OAuth scope**: `admin:proposal:read` + +**Response** `200 OK` + +```json +{ + "status": "ok", + "actor_client_id": "admin-dashboard", + "scopes": ["admin:proposal:read", "admin:proposal:review"] +} +``` + +### `POST /admin/release-proposals/{proposal_id}/review` + +Submit a review action. + +**OAuth scope**: `admin:proposal:review` + +**Path param**: `proposal_id` (integer >= 1) + +**Request body** + +```json +{ + "action": "approve", + "rationale": "Reviewed and accepted" } ``` -For strict machine contract files, see `contracts/api/openapi.yaml` and `contracts/schemas/`. +`action` values: `submit_review`, `approve`, `reject`, `request_changes`, `promote` + +**Response** `200 OK` + +```json +{ + "proposal_id": 12, + "action": "approve", + "actor": "admin-dashboard", + "status": "accepted", + "rationale": "Reviewed and accepted" +} +``` + +## Internal monitoring + +### `GET /internal/monitoring/queue/metrics` + +Queue metrics snapshot. + +**OAuth scope**: `internal:queue:read` + +**Response** `200 OK` + +```json +{ + "queue_depth_by_priority": {"critical": 0, "urgent": 1, "standard": 3, "batch": 9}, + "sla_breach_count_by_priority": {"critical": 0, "urgent": 0, "standard": 0, "batch": 1}, + "actor_client_id": "ops-service" +} +``` + +## Error response format + +All API errors use this shape: + +```json +{ + "error_code": "HTTP_400", + "message": "Invalid request payload (1 validation error(s))", + "request_id": "abc-123" +} +``` + +## Rate-limiting headers (`POST /v1/moderate`) + +| Header | Description | +|--------|-------------| +| `X-RateLimit-Limit` | Max requests per window | +| `X-RateLimit-Remaining` | Remaining requests in current window | +| `X-RateLimit-Reset` | Seconds until window resets | +| `Retry-After` | Seconds to wait (`429` only) | diff --git a/docs/deployment.md b/docs/deployment.md index e0cfed4..9c40e79 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -1,42 +1,361 @@ # Deployment Guide -## Runtime components +This guide is for operators deploying and managing a Sentinel instance. It covers infrastructure, configuration, database setup, lexicon management, electoral phases, deployment stages, OAuth, and monitoring. -- Sentinel API (FastAPI) -- PostgreSQL (required for governed lifecycle features) -- Redis (hot trigger caching) +## Architecture overview -## Required environment variables +Sentinel has three runtime components: -- `SENTINEL_API_KEY` -- `SENTINEL_DATABASE_URL` +``` +┌─────────────┐ ┌──────────────────┐ ┌─────────┐ +│ Your App │────>│ Sentinel API │────>│ Postgres │ +│ (client) │<────│ (FastAPI) │<────│ pgvector │ +└─────────────┘ │ │ └──────────┘ + │ │────>┌─────────┐ + └──────────────────┘<────│ Redis │ + └──────────┘ +``` + +- **Sentinel API** — FastAPI application serving moderation, admin, and monitoring endpoints +- **PostgreSQL with pgvector** — Stores lexicon entries, releases, embeddings, appeals, transparency data, model artifacts. Required for production. +- **Redis** — Distributed rate limiting and hot-trigger caching. Optional; Sentinel degrades gracefully if unavailable. + +Without Postgres, Sentinel runs in file-based fallback mode (lexicon loaded from `data/lexicon_seed.json`, appeals stored in-memory). This is suitable for development but not production. + +## Environment variables + +This section focuses on API runtime and operator-facing variables. A few script-only actor variables are intentionally omitted. + +### Core + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_API_KEY` | Yes | — | API key for authenticating `POST /v1/moderate` requests | +| `SENTINEL_DATABASE_URL` | No | — | Postgres connection string (e.g., `postgresql://user:pass@host:5432/sentinel`). Enables lexicon DB, vector search, appeals, transparency. | +| `SENTINEL_REDIS_URL` | No | — | Redis connection string. Enables distributed rate limiting and hot-trigger caching. | +| `SENTINEL_POLICY_CONFIG_PATH` | No | auto-detected | Path to policy configuration file (`config/policy/default.json` when present) | + +### Electoral and deployment + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_ELECTORAL_PHASE` | No | `null` (from config) | Override electoral phase: `pre_campaign`, `campaign`, `silence_period`, `voting_day`, `results_period` | +| `SENTINEL_DEPLOYMENT_STAGE` | No | `supervised` | Override deployment stage: `shadow`, `advisory`, `supervised` | + +### Rate limiting + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_RATE_LIMIT_PER_MINUTE` | No | `120` | Max requests per API key per minute | +| `SENTINEL_RATE_LIMIT_STORAGE_URI` | No | — | Rate limit storage URI (alternative to `SENTINEL_REDIS_URL`) | + +### Redis hot triggers + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_REDIS_HOT_TRIGGER_KEY_PREFIX` | No | `sentinel:hot-triggers` | Redis key prefix for cached hot-trigger terms | +| `SENTINEL_REDIS_HOT_TRIGGER_TTL_SECONDS` | No | — | Optional TTL for hot-trigger cache keys | +| `SENTINEL_REDIS_SOCKET_TIMEOUT_SECONDS` | No | `0.05` | Redis socket timeout in seconds | + +### OAuth (admin endpoints) + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_OAUTH_TOKENS_JSON` | No | — | JSON object mapping static tokens to `{client_id, scopes}` | +| `SENTINEL_OAUTH_JWT_SECRET` | No | — | JWT signing secret. If set, enables JWT bearer auth instead of static tokens. | +| `SENTINEL_OAUTH_JWT_ALGORITHM` | No | `HS256` | JWT algorithm | +| `SENTINEL_OAUTH_JWT_AUDIENCE` | No | — | Expected JWT audience claim (optional verification) | +| `SENTINEL_OAUTH_JWT_ISSUER` | No | — | Expected JWT issuer claim (optional verification) | + +### ML and vector matching + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_VECTOR_MATCH_ENABLED` | No | `true` | Enable/disable vector similarity search | +| `SENTINEL_VECTOR_MATCH_THRESHOLD` | No | `0.82` | Minimum cosine similarity for vector matches | +| `SENTINEL_VECTOR_STATEMENT_TIMEOUT_MS` | No | `60` | Postgres statement timeout for vector queries | +| `SENTINEL_LID_MODEL_PATH` | No | — | Path to FastText language identification model | +| `SENTINEL_LID_CONFIDENCE_THRESHOLD` | No | `0.80` | Minimum confidence for language detection | +| `SENTINEL_EMBEDDING_PROVIDER` | No | `hash-bow-v1` | Embedding provider ID | +| `SENTINEL_CLASSIFIER_PROVIDER` | No | `none-v1` | Multi-label classifier provider ID | +| `SENTINEL_CLAIM_SCORER_PROVIDER` | No | `claim-heuristic-v1` | Claim scorer provider ID | + +### Classifier shadow mode + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `SENTINEL_CLASSIFIER_SHADOW_ENABLED` | No | `0` | Enable shadow classifier predictions (only in SHADOW/ADVISORY stages) | +| `SENTINEL_SHADOW_PREDICTIONS_PATH` | No | — | File path for shadow prediction logs | +| `SENTINEL_CLASSIFIER_TIMEOUT_MS` | No | `40` | Classifier timeout in milliseconds | +| `SENTINEL_CLASSIFIER_MIN_SCORE` | No | `0.55` | Minimum score to emit a shadow prediction | +| `SENTINEL_CLASSIFIER_CIRCUIT_FAILURE_THRESHOLD` | No | `3` | Consecutive failures before circuit breaker opens | +| `SENTINEL_CLASSIFIER_CIRCUIT_RESET_SECONDS` | No | `120` | Seconds before circuit breaker resets | + +## Docker Compose setup (development/staging) + +```bash +# Start Postgres + Redis +docker compose up -d --build postgres redis + +# Run migrations +make apply-migrations + +# Load seed lexicon +make seed-lexicon + +# Start API +export SENTINEL_API_KEY='your-key' +make run +``` + +The included `docker-compose.yml` defines API, PostgreSQL 16 (pgvector), and Redis services. The command above starts only Postgres and Redis so you can run `make run` locally with hot reload. + +## Production deployment guidance -Common optional variables: +For production: -- `SENTINEL_REDIS_URL` -- `SENTINEL_POLICY_CONFIG_PATH` -- `SENTINEL_ELECTORAL_PHASE` -- `SENTINEL_DEPLOYMENT_STAGE` +- **Reverse proxy**: Place Sentinel behind nginx, Caddy, or a cloud load balancer. Terminate TLS at the proxy. +- **Managed database**: Use a managed PostgreSQL service with pgvector support. Ensure `pg_trgm` and `vector` extensions are available. +- **Replicas**: Sentinel is stateless (all state lives in Postgres/Redis). Run multiple API replicas behind a load balancer. +- **TLS**: All traffic between clients and the API should use HTTPS. Database connections should use SSL. +- **Admin endpoint isolation**: Consider running admin endpoints on a separate internal network or port, not exposed to the public internet. +- **Secrets**: Use a secrets manager for `SENTINEL_API_KEY`, `SENTINEL_DATABASE_URL`, and OAuth credentials. Do not pass secrets as command-line arguments. -## Docker Compose deployment +## Database migrations + +Sentinel includes 12 migration files in `migrations/`. Run them with: ```bash -docker compose up -d --build -python scripts/apply_migrations.py --database-url postgresql://sentinel:sentinel@localhost:5432/sentinel -python scripts/sync_lexicon_seed.py --database-url postgresql://sentinel:sentinel@localhost:5432/sentinel --activate-if-none +make apply-migrations ``` -## Readiness checks +Or directly: ```bash -curl -sS http://localhost:8000/health; echo -curl -sS http://localhost:8000/metrics; echo +python scripts/apply_migrations.py --database-url "$SENTINEL_DATABASE_URL" ``` -## Go-live governance gate +| Migration | Description | +|-----------|-------------| +| `0001_lexicon_entries.sql` | Core lexicon entries table | +| `0002_lexicon_releases.sql` | Lexicon release lifecycle (draft/active/deprecated) | +| `0003_lexicon_release_audit.sql` | Audit trail for release state transitions | +| `0004_async_monitoring_core.sql` | Async monitoring queue tables | +| `0005_lexicon_release_audit_proposal_promote.sql` | Proposal promotion audit support | +| `0006_retention_legal_hold_primitives.sql` | Data retention and legal hold primitives | +| `0007_lexicon_entry_embeddings.sql` | pgvector embeddings for lexicon entries | +| `0008_appeals_core.sql` | Appeals state machine tables | +| `0009_appeals_original_decision_id_backfill.sql` | Backfill original decision IDs on appeals | +| `0010_monitoring_queue_event_uniqueness.sql` | Queue event deduplication | +| `0011_lexicon_entry_metadata_hardening.sql` | Metadata validation constraints | +| `0012_model_artifact_lifecycle.sql` | Model artifact version tracking | + +Migrations are ordered and tracked via Alembic revision history. Running `make apply-migrations` repeatedly is safe. + +## Lexicon lifecycle + +The lexicon is Sentinel's primary enforcement mechanism. Terms are organized into versioned releases with a governed lifecycle. + +### Release states + +``` +Draft ──> Active ──> Deprecated +``` -Release approval is enforced with: +Only one release can be active at a time (enforced by a database unique index). The active release is what the moderation endpoint uses. + +### Seed lexicon + +Sentinel ships with a 7-term demonstration seed (`data/lexicon_seed.json`): + +| Term | Action | Label | Language | +|------|--------|-------|----------| +| kill | BLOCK | INCITEMENT_VIOLENCE | en | +| burn them | BLOCK | INCITEMENT_VIOLENCE | en | +| mchome | BLOCK | ETHNIC_CONTEMPT | sw | +| hunt you down | BLOCK | HARASSMENT_THREAT | en | +| deal with them | REVIEW | DOGWHISTLE_WATCH | en | +| wataona | REVIEW | DOGWHISTLE_WATCH | sw | +| rigged | REVIEW | DISINFO_RISK | en | + +This seed is a **demonstration dataset only**. Production deployment requires building a comprehensive lexicon with domain-expert annotation covering the specific hate speech, incitement, and disinformation patterns relevant to your context. + +### Lifecycle commands + +```bash +# Load seed and activate (first-time setup) +make seed-lexicon + +# Create a new release +make release-create VERSION=hatelex-v2.2 + +# Ingest terms into a draft release +make release-ingest VERSION=hatelex-v2.2 INPUT=data/lexicon_seed.json + +# Validate a release against quality gates +make release-validate VERSION=hatelex-v2.2 + +# Activate a release (deactivates the previous active release) +make release-activate VERSION=hatelex-v2.2 + +# Deprecate an old release +make release-deprecate VERSION=hatelex-v2.1 +``` + +## Electoral phase configuration + +Sentinel adjusts moderation sensitivity based on the electoral cycle. Five phases are supported: + +| Phase | Vector threshold | No-match action | Behavior | +|-------|-----------------|-----------------|----------| +| `pre_campaign` | 0.82 | ALLOW | Baseline sensitivity | +| `campaign` | 0.85 | ALLOW | Slightly tightened | +| `silence_period` | 0.88 | REVIEW | Unmatched content goes to review | +| `voting_day` | 0.90 | REVIEW | Maximum sensitivity | +| `results_period` | 0.88 | REVIEW | Maintained high sensitivity | + +During `silence_period`, `voting_day`, and `results_period`, the `no_match_action` changes to `REVIEW`, meaning content that doesn't match any lexicon entry still gets flagged for human review. + +### Setting the phase + +Set the electoral phase via environment variable: + +```bash +export SENTINEL_ELECTORAL_PHASE=campaign +``` + +Or in the policy config file (`config/policy/default.json`): + +```json +{ + "electoral_phase": "campaign" +} +``` + +The environment variable takes precedence over the config file. If neither is set, no phase-specific overrides are applied. + +### Phase safety constraint + +Phase overrides cannot lower the BLOCK toxicity threshold below the baseline value. This prevents accidental weakening of the most critical moderation threshold during heightened periods. + +## Deployment stages + +Deployment stages control enforcement behavior. Use them to roll out Sentinel safely. + +| Stage | BLOCK behavior | REVIEW behavior | ALLOW behavior | Use case | +|-------|---------------|-----------------|----------------|----------| +| `shadow` | Downgraded to ALLOW | Downgraded to ALLOW | No change | Observe decisions without enforcement; log-only mode | +| `advisory` | Downgraded to REVIEW | No change | No change | Enforcement active but no content is blocked; human review required for all blocks | +| `supervised` | Full enforcement | No change | No change | Production mode; all actions enforced as-is | + +### Recommended rollout path + +1. **SHADOW** — Deploy Sentinel and route traffic. Log all decisions but enforce nothing. Analyze decision quality. +2. **ADVISORY** — Enable enforcement but cap at REVIEW. Human moderators see what Sentinel would block. Build confidence. +3. **SUPERVISED** — Full enforcement. Sentinel blocks content autonomously based on lexicon matches. + +Set the stage: + +```bash +export SENTINEL_DEPLOYMENT_STAGE=advisory +``` + +Default is `supervised`. The policy version string encodes the active stage (e.g., `policy-2026.11@campaign#advisory`). + +## OAuth setup for admin endpoints + +Admin endpoints (appeals, transparency, release proposals, internal monitoring) require OAuth bearer token authentication. + +### Option 1: Static token registry (simple) + +Create a JSON file mapping tokens to client identities and scopes: + +```json +{ + "token-abc-123": { + "client_id": "admin-dashboard", + "scopes": ["admin:appeal:read", "admin:appeal:write", "admin:transparency:read"] + }, + "token-def-456": { + "client_id": "ci-pipeline", + "scopes": ["admin:proposal:read", "admin:proposal:review"] + } +} +``` + +Set the environment variable: + +```bash +export SENTINEL_OAUTH_TOKENS_JSON='{"token-abc-123": {"client_id": "admin-dashboard", "scopes": ["admin:appeal:read", "admin:appeal:write"]}}' +``` + +### Option 2: JWT bearer tokens + +For production, configure JWT validation: + +```bash +export SENTINEL_OAUTH_JWT_SECRET='your-jwt-signing-secret' +export SENTINEL_OAUTH_JWT_ALGORITHM='HS256' # optional, default HS256 +export SENTINEL_OAUTH_JWT_AUDIENCE='sentinel-admin' # optional +export SENTINEL_OAUTH_JWT_ISSUER='your-auth-server' # optional +``` + +JWTs must include `client_id` (or `sub`) and `scopes` (or `scope` as space-delimited string) claims. + +### OAuth scopes + +| Scope | Grants access to | +|-------|-----------------| +| `admin:appeal:read` | List appeals, reconstruct appeal audit trail | +| `admin:appeal:write` | Create appeals, transition appeal states | +| `admin:transparency:read` | Aggregate transparency reports | +| `admin:transparency:export` | Raw appeals data export | +| `admin:transparency:identifiers` | Include identifier fields (`request_id`, `original_decision_id`) in exports | +| `admin:proposal:read` | View release proposal permissions | +| `admin:proposal:review` | Submit, approve, reject, promote release proposals | +| `internal:queue:read` | Internal monitoring queue metrics | + +## Monitoring + +### Health check + +```bash +curl http://localhost:8000/health +# {"status": "ok"} +``` + +### Metrics + +```bash +# JSON format +curl http://localhost:8000/metrics + +# Prometheus text format +curl http://localhost:8000/metrics/prometheus +``` + +The metrics endpoint returns action counts, HTTP status counts, latency histogram buckets, and validation error counts. + +### Structured logging + +Sentinel propagates `X-Request-ID` headers through all requests. If the client provides one, Sentinel uses it; otherwise one is generated. Use this ID to correlate logs across your infrastructure. + +### Internal monitoring + +With the `internal:queue:read` OAuth scope: + +```bash +curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/internal/monitoring/queue/metrics +``` + +Returns queue depth and SLA breach snapshot data. + +## Go-live readiness gate + +Before production rollout, run the readiness gate: ```bash python scripts/check_go_live_readiness.py --bundle-dir releases/go-live/ ``` + +This validates that all required artifacts (lexicon release, policy config, migration state, launch profile) are present and consistent. See `templates/go-live/` for the template bundle structure. diff --git a/docs/faq.md b/docs/faq.md index 78a1d10..01d39a6 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -1,17 +1,146 @@ # FAQ -## Is Sentinel production-ready? +## General -Core implementation tasks are complete. Production rollout should still follow the go-live readiness gate and sign-off process. +### What is Sentinel? -## Can Sentinel directly auto-block from ML predictions? +Sentinel is an open-source multilingual moderation API built to protect Kenya's 2027 general election from ethnic incitement and election disinformation. It handles code-switched text across English, Swahili, and Sheng, and returns deterministic moderation decisions (`ALLOW`, `REVIEW`, `BLOCK`) with full audit evidence. -Initial ML paths are safety-constrained. Governance and policy controls determine whether model signals can enforce beyond advisory/shadow behavior. +### Is Sentinel production-ready? -## Do I need ML dependencies to use Sentinel? +The moderation pipeline, appeals system, transparency reporting, and safety controls are fully implemented. However: -No. Base deterministic moderation works without `.[ml]`. Install `.[ml]` only when you need optional ML runtime paths. +- The **seed lexicon contains only 7 demonstration terms**. Production use requires building a comprehensive lexicon with domain-expert annotation covering the specific hate speech, incitement, and disinformation patterns relevant to your context. +- The **multi-label classifier runs in shadow mode** (observability only). +- The **claim-likeness scorer is active for REVIEW-only disinformation signals** (cannot produce `BLOCK`). +- Deterministic lexicon matches remain the only direct path to `BLOCK`. -## Can I use this for a small forum? +Production rollout should follow the go-live readiness gate (`python scripts/check_go_live_readiness.py`) and the SHADOW -> ADVISORY -> SUPERVISED deployment stage progression. -Yes. Integrate server-to-server with `POST /v1/moderate` and map `ALLOW/REVIEW/BLOCK` to your moderation workflow. +### What languages does Sentinel support? + +Sentinel currently supports three language codes in the moderation pipeline: + +- **English (en)** — baseline deterministic support +- **Swahili (sw)** — baseline deterministic support with hint-word detection +- **Sheng (sh)** — baseline deterministic support with hint-word detection + +Language routing is token-informed and returns span-level `language_spans`, so code-switched text (e.g., English + Sheng in one sentence) is handled natively. Additional languages can be added via the language pack system. + +### What are the six labels? + +| Label | Meaning | +|-------|---------| +| `ETHNIC_CONTEMPT` | Ethnic slurs, dehumanizing language targeting ethnic groups | +| `INCITEMENT_VIOLENCE` | Direct calls to harm, kill, or attack | +| `HARASSMENT_THREAT` | Targeted personal threats | +| `DOGWHISTLE_WATCH` | Ambiguous language that may carry coded meaning — flagged for human review | +| `DISINFO_RISK` | Content resembling known disinformation narratives | +| `BENIGN_POLITICAL_SPEECH` | Normal political discourse, no policy concern detected | + +--- + +## Integrators + +### Can I call Sentinel from the browser? + +No. Sentinel is a server-side API. Your backend should call Sentinel and apply the enforcement decision. Never expose the API key to frontend code. + +### What should I do if Sentinel is unavailable? + +Default to `REVIEW` for safety-critical content. Never default to `ALLOW` for content that hasn't been moderated. Set a request timeout of 500-1000ms and implement a circuit breaker pattern. See the [Integration Guide](integration-guide.md) for detailed failure handling recommendations. + +### What do the labels mean for my moderation workflow? + +The `action` field is your primary enforcement signal: +- `ALLOW` — publish the content +- `REVIEW` — hold for human moderator review +- `BLOCK` — reject publication + +Labels and reason codes provide detail for moderators reviewing flagged content and for users who want to understand why their content was actioned. + +### Can ML predictions auto-block content? + +No. This is an intentional safety constraint. Only deterministic lexicon matches (exact normalized regex against known terms) can produce a `BLOCK` action. Vector similarity and claim-likeness paths are REVIEW-only. Multi-label classifier predictions operate in shadow mode and do not change enforcement. See [Security](security.md) for the full safety architecture. + +### What is the text size limit? + +1 to 5,000 characters. Empty strings are rejected (400 error). Text exceeding 5,000 characters is rejected. + +### Can I use Sentinel for a small forum? + +Yes. Integrate server-to-server with `POST /v1/moderate` and map the three actions to your moderation workflow. Sentinel runs without Postgres or Redis in development mode (file-based lexicon, in-memory rate limiting), but production use should include Postgres for the full feature set. + +--- + +## Operators + +### How do I switch electoral phases? + +Set the `SENTINEL_ELECTORAL_PHASE` environment variable and restart the API: + +```bash +export SENTINEL_ELECTORAL_PHASE=voting_day +``` + +The five phases are `pre_campaign`, `campaign`, `silence_period`, `voting_day`, and `results_period`. Each phase adjusts vector match thresholds and no-match behavior. During `silence_period`, `voting_day`, and `results_period`, unmatched content defaults to `REVIEW` instead of `ALLOW`. See the [Deployment Guide](deployment.md) for the full phase table. + +### How do I manage the lexicon? + +The lexicon follows a governed lifecycle: Draft -> Active -> Deprecated. Use the make targets: + +```bash +make release-create # Create a new draft release +make release-ingest # Add terms to the draft +make release-validate # Validate against quality gates +make release-activate # Activate (deactivates previous active) +make release-deprecate # Deprecate old releases +``` + +Only one release can be active at a time. The active release is what the moderation endpoint uses for matching. See the [Deployment Guide](deployment.md) for the full lexicon lifecycle. + +### What is the recommended rollout path? + +1. **SHADOW** — Deploy and route traffic. All decisions are downgraded to ALLOW. Log everything, analyze decision quality, tune the lexicon. +2. **ADVISORY** — BLOCK decisions are downgraded to REVIEW. Human moderators see what Sentinel would block. Build confidence in the lexicon. +3. **SUPERVISED** — Full enforcement. BLOCK actions are applied automatically. + +Set the stage with `SENTINEL_DEPLOYMENT_STAGE=shadow|advisory|supervised`. + +### How do I set up OAuth for admin endpoints? + +For development, use a static token registry: + +```bash +export SENTINEL_OAUTH_TOKENS_JSON='{"my-token": {"client_id": "admin", "scopes": ["admin:appeal:read", "admin:appeal:write"]}}' +``` + +For production, configure JWT validation: + +```bash +export SENTINEL_OAUTH_JWT_SECRET='your-secret' +``` + +See the [Deployment Guide](deployment.md) for the full OAuth setup including all scopes. + +### How do I set up monitoring? + +Sentinel exposes three monitoring endpoints: + +- `GET /health` — basic health check (no auth) +- `GET /metrics` — JSON metrics: action counts, HTTP status counts, latency buckets (no auth) +- `GET /metrics/prometheus` — Prometheus text format for scraping (no auth) + +For internal queue monitoring, use `GET /internal/monitoring/queue/metrics` with the `internal:queue:read` OAuth scope. + +All requests carry an `X-Request-ID` header for log correlation. + +### What is the go-live readiness gate? + +The go-live gate validates that all required artifacts are present and consistent before production deployment: + +```bash +python scripts/check_go_live_readiness.py --bundle-dir releases/go-live/ +``` + +It checks for a valid lexicon release, policy config, migration state, and launch profile. Use the template bundle in `templates/go-live/` as a starting point. diff --git a/docs/integration-guide.md b/docs/integration-guide.md index 2ecfa40..7d3044e 100644 --- a/docs/integration-guide.md +++ b/docs/integration-guide.md @@ -1,39 +1,230 @@ # Integration Guide -Sentinel is a server-side moderation API. Your app sends text, Sentinel returns `ALLOW`, `REVIEW`, or `BLOCK` with audit evidence. +This guide is for platform integrators calling the Sentinel API from a backend application. It covers authentication, the full request/response schema, enforcement patterns, rate limiting, and error handling. -## Recommended flow +Sentinel is a server-side API. Never call it directly from a browser or mobile client — always proxy through your backend. -1. User submits content in your forum. -2. Your backend calls `POST /v1/moderate`. -3. Your backend applies enforcement: - - `ALLOW`: publish - - `REVIEW`: hold for moderator - - `BLOCK`: reject publish -4. Store decision metadata for audit and appeals. +## Authentication -## Request example +All moderation requests require an API key passed in the `X-API-Key` header: -```bash -curl -sS -X POST http://localhost:8000/v1/moderate \ - -H 'Content-Type: application/json' \ - -H "X-API-Key: ${SENTINEL_API_KEY}" \ - -d '{"text":"Sample post"}' +``` +X-API-Key: your-api-key +``` + +Keep this key server-side. If it leaks to a client, rotate it immediately. + +## Request + +### `POST /v1/moderate` + +```json +{ + "text": "The content to moderate", + "context": { + "source": "forum-post", + "locale": "ke", + "channel": "politics" + }, + "request_id": "your-unique-id-123" +} +``` + +| Field | Type | Required | Constraints | Description | +|-------|------|----------|-------------|-------------| +| `text` | string | Yes | 1-5000 characters | The text to moderate | +| `context` | object | No | — | Optional metadata about the content source | +| `context.source` | string | No | Max 100 chars | Where the content came from (e.g., "forum-post", "comment") | +| `context.locale` | string | No | Max 20 chars | Geographic locale (e.g., "ke" for Kenya) | +| `context.channel` | string | No | Max 50 chars | Content channel or category | +| `request_id` | string | No | Max 128 chars | Client-provided idempotency/correlation ID | + +If you don't provide `request_id`, Sentinel generates one and returns it in the `X-Request-ID` response header. + +## Response + +### Full response schema + +```jsonc +{ + // Toxicity score (0.0-1.0). Higher = more toxic. + "toxicity": 0.92, + + // Labels detected in the text (one or more). + "labels": ["INCITEMENT_VIOLENCE"], + + // Enforcement decision. + "action": "BLOCK", + + // Machine-readable codes explaining the decision. + "reason_codes": ["R_INCITE_CALL_TO_HARM"], + + // Evidence items that drove the decision. + "evidence": [ + { + "type": "lexicon", + "match": "kill", + "severity": 3, + "lang": "en" + } + ], + + // Detected language spans with character offsets. + "language_spans": [ + {"start": 0, "end": 26, "lang": "en"} + ], + + // Artifact versions (for audit trail). + "model_version": "sentinel-multi-v2", + "lexicon_version": "hatelex-v2.1", + "pack_versions": {"en": "pack-en-0.1", "sw": "pack-sw-0.1", "sh": "pack-sh-0.1"}, + "policy_version": "policy-2026.11", + + // Server-side latency in milliseconds. + "latency_ms": 12 +} +``` + +### Labels + +| Label | Description | +|-------|-------------| +| `ETHNIC_CONTEMPT` | Ethnic slurs, dehumanizing language targeting ethnic groups | +| `INCITEMENT_VIOLENCE` | Direct calls to harm, kill, or attack | +| `HARASSMENT_THREAT` | Targeted personal threats | +| `DOGWHISTLE_WATCH` | Ambiguous language that may carry coded meaning — requires human review | +| `DISINFO_RISK` | Content resembling known disinformation narratives (claim likeness match) | +| `BENIGN_POLITICAL_SPEECH` | No policy match — normal political discourse | + +A response may contain multiple labels if multiple concerns are detected. + +### Actions + +| Action | Meaning | Recommended handling | +|--------|---------|---------------------| +| `ALLOW` | No policy concern detected | Publish the content | +| `REVIEW` | Potential concern detected | Hold for human moderator review | +| `BLOCK` | High-confidence policy violation | Reject publication | + +During heightened electoral phases (`SILENCE_PERIOD`, `VOTING_DAY`, `RESULTS_PERIOD`), Sentinel tightens thresholds. Content that would receive `ALLOW` during `PRE_CAMPAIGN` may receive `REVIEW` during `VOTING_DAY`. Your application should handle this gracefully. + +### Evidence types + +Each evidence item has a `type` field indicating how the match was produced: + +| Type | Description | Can produce BLOCK? | +|------|-------------|-------------------| +| `lexicon` | Deterministic match against a known term in the lexicon | Yes | +| `vector_match` | Semantic similarity match via pgvector | No (REVIEW only) | +| `model_span` | Heuristic/model-derived span evidence (e.g., claim-likeness or no-match context) | No | + +The safety constraint that vector matches and model spans cannot produce `BLOCK` is intentional. Only deterministic lexicon matches can block content. + +### Evidence item fields + +| Field | Type | Description | +|-------|------|-------------| +| `type` | string | `lexicon`, `vector_match`, or `model_span` | +| `match` | string or null | Matched text/term (often null for `model_span`) | +| `severity` | integer or null | 1 (low), 2 (medium), 3 (high) when available | +| `lang` | string or null | Language code when available | +| `match_id` | string | Unique identifier for the matched entry (optional) | +| `similarity` | float | Cosine similarity score, 0.0-1.0 (vector matches only) | +| `span` | string | Text span context (model spans only, optional) | +| `confidence` | float | Confidence score, 0.0-1.0 (model spans only, optional) | + +## Applying enforcement decisions + +Basic integration pattern: + +```python +response = requests.post( + f"{SENTINEL_URL}/v1/moderate", + json={"text": user_text, "request_id": post_id}, + headers={"X-API-Key": API_KEY}, + timeout=1.0, +) +result = response.json() + +if result["action"] == "ALLOW": + publish(post) +elif result["action"] == "REVIEW": + enqueue_for_moderation(post, sentinel_response=result) +elif result["action"] == "BLOCK": + reject(post, reason=result["reason_codes"]) +``` + +Always persist the full Sentinel response alongside the content. You'll need it for appeals and audit. + +## What to persist + +Store these fields in your database alongside the moderated content: + +| Field | Why | +|-------|-----| +| `action` | The enforcement decision applied | +| `labels` | What was detected | +| `reason_codes` | Machine-readable explanation | +| `evidence` | Full match details for appeal reconstruction | +| `model_version` | Which model version was used | +| `lexicon_version` | Which lexicon version was used | +| `policy_version` | Which policy config was active | +| `pack_versions` | Which language packs were active | +| `X-Request-ID` header | Correlation ID linking your records to Sentinel's | + +This data enables reconstructing exactly why a decision was made, even if the lexicon or policy has since changed. The appeals system uses these fields to rebuild the decision context. + +## Rate limiting + +Sentinel enforces per-key rate limits (default: 120 requests/minute). + +### Response headers + +Successful moderation responses include rate limit headers. `429` responses include the same headers plus `Retry-After`. + +| Header | Description | +|--------|-------------| +| `X-RateLimit-Limit` | Maximum requests allowed per window | +| `X-RateLimit-Remaining` | Requests remaining in current window | +| `X-RateLimit-Reset` | Seconds until the window resets | + +### 429 Too Many Requests + +When rate limited, Sentinel returns HTTP 429 with: + +- A `Retry-After` header indicating seconds to wait +- An `ErrorResponse` body + +Implement exponential backoff or honor `Retry-After`. Do not retry immediately. + +## Error handling + +### Error response format + +All errors return a consistent structure: + +```json +{ + "error_code": "HTTP_400", + "message": "Invalid request payload (1 validation error(s))", + "request_id": "abc-123" +} ``` -## What to persist in your DB +### HTTP status codes -- `action` -- `labels` -- `reason_codes` -- `evidence` -- `model_version` -- `lexicon_version` -- `policy_version` -- request ID (`X-Request-ID`) +| Status | Meaning | Action | +|--------|---------|--------| +| 200 | Success | Process the moderation response | +| 400 | Validation error (bad request body) | Fix the request — check `text` length (1-5000) and field types | +| 401 | Missing or invalid API key | Check `X-API-Key` header | +| 503 | API authentication not configured on the server | Operator must set `SENTINEL_API_KEY` | +| 429 | Rate limited | Wait for `Retry-After` seconds, then retry | +| 500 | Internal server error | Retry with backoff; default to REVIEW if persistent | -## Failure handling +### Failure handling recommendations -- Use request timeout (for example 500-1000ms). -- If Sentinel is unavailable, default to `REVIEW` for safety-critical contexts. -- Never call Sentinel directly from the browser; call from your backend only. +- **Timeout**: Set a request timeout of 500-1000ms. Sentinel targets P95 latency under 150ms. +- **Unavailability**: If Sentinel is unreachable, default to `REVIEW` for safety-critical content. Never default to `ALLOW` for content that hasn't been checked. +- **Retries**: Retry on 500 and network errors with exponential backoff. Do not retry on 400 or 401. +- **Circuit breaker**: If Sentinel returns errors persistently, open a circuit breaker and route all content to your human moderation queue. diff --git a/docs/quickstart.md b/docs/quickstart.md index f5a7e81..ca877f6 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -1,60 +1,152 @@ # Quickstart -This quickstart gets Sentinel running locally and validates one moderation request. +Choose your path: -## Prerequisites +- **Path A** — You're an integrator calling an existing Sentinel instance +- **Path B** — You're an operator setting up Sentinel from scratch -- Python 3.12+ -- Docker + Docker Compose +## Path A: Integrator -## 1. Install and activate +You have access to a running Sentinel instance and an API key. Four steps to your first moderation call. + +### 1. Check health + +```bash +curl -sS https://your-sentinel-host/health +``` + +Expected: `{"status":"ok"}` + +### 2. Set your API key + +```bash +export SENTINEL_API_KEY='your-api-key' +``` + +### 3. Send a moderation request + +```bash +curl -sS -X POST https://your-sentinel-host/v1/moderate \ + -H 'Content-Type: application/json' \ + -H "X-API-Key: ${SENTINEL_API_KEY}" \ + -d '{"text": "They should kill them now."}' +``` + +### 4. Read the response + +```jsonc +{ + "toxicity": 0.92, + "labels": ["INCITEMENT_VIOLENCE"], + "action": "BLOCK", + "reason_codes": ["R_INCITE_CALL_TO_HARM"], + "evidence": [ + { + "type": "lexicon", + "match": "kill", + "severity": 3, + "lang": "en" + } + ], + "language_spans": [{"start": 0, "end": 26, "lang": "en"}], + "model_version": "sentinel-multi-v2", + "lexicon_version": "hatelex-v2.1", + "pack_versions": {"en": "pack-en-0.1", "sw": "pack-sw-0.1", "sh": "pack-sh-0.1"}, + "policy_version": "policy-2026.11", + "latency_ms": 12 +} +``` + +The `action` field is your enforcement decision. Map it in your application: `ALLOW` -> publish, `REVIEW` -> hold for moderator, `BLOCK` -> reject. See the [Integration Guide](integration-guide.md) for the full request/response schema and enforcement patterns. + +--- + +## Path B: Operator + +Set up a local Sentinel instance from source. You'll need Python 3.12+ and Docker with Docker Compose. + +### 1. Clone and install ```bash +git clone https://github.com/Thelastpoet/sentinel.git +cd sentinel python -m venv .venv source .venv/bin/activate -python -m pip install --upgrade pip -python -m pip install -e .[dev,ops] +pip install -e .[dev,ops] ``` -Optional ML dependencies: +This installs the Sentinel API server and all development/operations tooling. The `.[ml]` extra is optional and only needed if you want to experiment with ML classifier scaffolding. + +### 2. Start infrastructure ```bash -python -m pip install -e .[ml] +docker compose up -d --build postgres redis ``` -## 2. Start infrastructure +This starts PostgreSQL (with pgvector) and Redis. Sentinel uses Postgres for lexicon storage, vector similarity search, appeals, and transparency exports. Redis is used for distributed rate limiting and hot-trigger caching. + +### 3. Run database migrations ```bash -docker compose up -d --build +make apply-migrations ``` -## 3. Configure and seed +This runs all 12 migration files against the local Postgres instance, creating tables for lexicon entries, releases, embeddings, appeals, monitoring, and model artifacts. + +### 4. Load the seed lexicon ```bash -export SENTINEL_API_KEY='replace-with-strong-key' -python scripts/apply_migrations.py --database-url postgresql://sentinel:sentinel@localhost:5432/sentinel -python scripts/sync_lexicon_seed.py --database-url postgresql://sentinel:sentinel@localhost:5432/sentinel --activate-if-none +make seed-lexicon ``` -## 4. Run API +This loads the 7-term demonstration lexicon (`data/lexicon_seed.json`) and activates it. The seed contains example terms for `INCITEMENT_VIOLENCE`, `ETHNIC_CONTEMPT`, `HARASSMENT_THREAT`, `DOGWHISTLE_WATCH`, and `DISINFO_RISK`. + +### 5. Start the API ```bash -uvicorn sentinel_api.main:app --host 0.0.0.0 --port 8000 +export SENTINEL_API_KEY='replace-with-a-strong-key' +make run ``` -## 5. Verify health and moderation +The API starts on `http://localhost:8000` with hot reload enabled. + +### 6. Verify ```bash -curl -sS http://localhost:8000/health; echo +# Health check +curl -sS http://localhost:8000/health + +# Moderation request curl -sS -X POST http://localhost:8000/v1/moderate \ -H 'Content-Type: application/json' \ -H "X-API-Key: ${SENTINEL_API_KEY}" \ - -d '{"text":"They should kill them now."}'; echo + -d '{"text": "They should kill them now."}' ``` -## 6. Validate installation +You should see a `BLOCK` response with label `INCITEMENT_VIOLENCE` and evidence pointing to the lexicon match on "kill". + +### 7. Run tests (optional) ```bash -python -m pytest -q -python scripts/check_contract.py +make test +make contract ``` + +## What success looks like + +A successful moderation response always contains: + +- `action` — the enforcement decision (`ALLOW`, `REVIEW`, or `BLOCK`) +- `labels` — what was detected (e.g., `INCITEMENT_VIOLENCE`) +- `reason_codes` — machine-readable codes explaining why (e.g., `R_INCITE_CALL_TO_HARM`) +- `evidence` — the specific matches that drove the decision +- Provenance fields (`model_version`, `lexicon_version`, `policy_version`, `pack_versions`) — for audit + +## Seed lexicon caveat + +The 7-term seed lexicon is a **demonstration dataset**. It covers basic examples across five labels but is not sufficient for production moderation. Production deployment requires building a comprehensive lexicon through domain-expert annotation. See the [Deployment Guide](deployment.md) for lexicon lifecycle management. + +## Next steps + +- **Integrators**: [Integration Guide](integration-guide.md) — full schema reference, enforcement patterns, error handling +- **Operators**: [Deployment Guide](deployment.md) — production setup, electoral phases, deployment stages diff --git a/docs/security.md b/docs/security.md index fe8ea13..fb5af45 100644 --- a/docs/security.md +++ b/docs/security.md @@ -1,21 +1,120 @@ -# Security Notes +# Security -## API access +This document covers Sentinel's security architecture: authentication, authorization, input validation, safety constraints, and data handling. It is primarily for operators deploying Sentinel in production. -- Set a strong `SENTINEL_API_KEY`. -- Do not expose the API key to frontend clients. +## Authentication -## Admin/internal access +### API key (moderation endpoint) -Use OAuth/JWT configuration for internal and admin endpoints. +The `POST /v1/moderate` endpoint requires an `X-API-Key` header. The key is compared with constant-time `compare_digest` logic to reduce timing-attack risk. + +- Set a strong, random API key via `SENTINEL_API_KEY` +- Never expose the key to frontend clients — Sentinel should only be called server-to-server +- Rotate the key by updating the environment variable and restarting the API + +### OAuth bearer tokens (admin endpoints) + +All admin and internal endpoints require an `Authorization: Bearer ` header. Sentinel supports two authentication backends: + +**Static token registry** — For development and simple deployments. Configure via `SENTINEL_OAUTH_TOKENS_JSON` with a JSON mapping of tokens to client identities and scopes. + +**JWT bearer tokens** — For production. Configure via `SENTINEL_OAUTH_JWT_SECRET`. Supports audience and issuer verification via `SENTINEL_OAUTH_JWT_AUDIENCE` and `SENTINEL_OAUTH_JWT_ISSUER`. JWTs must include `client_id` (or `sub`) and `scopes` (or `scope` as space-delimited string) claims. + +If neither is configured, admin endpoints return 401 for all requests. + +## Authorization (OAuth scopes) + +Each admin endpoint requires a specific OAuth scope. Requests without the required scope receive 403 Forbidden. + +| Scope | Endpoints | +|-------|-----------| +| `admin:appeal:read` | List appeals, reconstruct audit trail | +| `admin:appeal:write` | Create appeals, transition appeal states | +| `admin:transparency:read` | Aggregate transparency reports | +| `admin:transparency:export` | Raw appeals data export | +| `admin:transparency:identifiers` | Include PII fields in exports (additive — also requires `admin:transparency:export`) | +| `admin:proposal:read` | View release proposal permissions | +| `admin:proposal:review` | Submit, approve, reject, promote release proposals | +| `internal:queue:read` | Internal monitoring queue metrics | + +Follow the principle of least privilege: grant each client only the scopes it needs. + +## Input validation + +Sentinel uses Pydantic v2 models with `extra="forbid"`: + +- **Text length**: 1-5000 characters (rejects empty strings and oversized input) +- **Context fields**: `source` (max 100), `locale` (max 20), `channel` (max 50) +- **Request ID**: Max 128 characters +- **Extra fields rejected**: Any field not in the schema causes a 400 error +- **Reason code format**: Enforced pattern `R_[A-Z0-9_]+` + +Validation failures return structured `ErrorResponse` payloads with `error_code: "HTTP_400"`. + +## Rate limiting + +Per-key sliding window rate limiting (default: 120 requests/minute): + +- In-memory by default; distributed via Redis when `SENTINEL_REDIS_URL` or `SENTINEL_RATE_LIMIT_STORAGE_URI` is set +- Falls back to in-memory rate limiting if Redis is unavailable +- Returns `429 Too Many Requests` with `Retry-After` header when exceeded + +## Safety architecture + +Sentinel's moderation pipeline includes several intentional safety constraints that prevent the system from causing harm even if misconfigured. + +### Deployment stages as safety control + +Deployment stages gate what enforcement actions Sentinel can take: + +| Stage | Effect | +|-------|--------| +| `SHADOW` | All decisions downgraded to ALLOW. Sentinel logs what it *would* do but enforces nothing. | +| `ADVISORY` | BLOCK decisions downgraded to REVIEW. No content is auto-blocked. | +| `SUPERVISED` | Full enforcement. This is the only stage where BLOCK is applied. | + +New deployments should start in SHADOW, progress to ADVISORY after validating decision quality, and only move to SUPERVISED with confidence in the lexicon and policy configuration. + +### Electoral phases as safety control + +Electoral phases automatically tighten sensitivity thresholds as election day approaches: + +- During `SILENCE_PERIOD`, `VOTING_DAY`, and `RESULTS_PERIOD`, unmatched content defaults to REVIEW instead of ALLOW +- Phase overrides **cannot lower** the BLOCK toxicity threshold below baseline — this is enforced in code to prevent accidental weakening of the most critical safety gate +- Vector match thresholds increase (0.82 -> 0.90) to reduce false positives during high-stakes periods + +### Vector match cannot BLOCK + +This is a hard safety constraint: vector similarity matches can only produce REVIEW, never BLOCK. Only deterministic lexicon matches (exact normalized regex) can block content. This ensures that no fuzzy/probabilistic matching can autonomously suppress speech. + +### Model-derived paths are safety-capped + +The multi-label classifier runs in shadow mode and does not affect enforcement actions. Claim-likeness scoring can flag content to `REVIEW`, but it cannot produce `BLOCK`. The deterministic lexicon path remains the only direct path to `BLOCK`. ## Data handling -- Store moderation decisions and evidence for auditability. -- Avoid storing unnecessary personal data in logs. +### Transparency exports and identifier masking + +The transparency export endpoint (`GET /admin/transparency/exports/appeals`) masks PII fields (`request_id`, `original_decision_id`) by default. Including these fields requires both the `admin:transparency:export` scope and the additional `admin:transparency:identifiers` scope. This two-scope design prevents accidental PII exposure in public transparency reports. + +### Legal hold primitives + +Database migration `0006_retention_legal_hold_primitives.sql` adds data retention and legal hold capabilities. These primitives support compliance with data retention requirements and legal proceedings. + +### Audit trail + +Every moderation response includes full provenance (model version, lexicon version, policy version, pack versions). The appeals system reconstructs the complete decision context at the time of the original moderation call, enabling fair review even after artifacts have been updated. + +## Secrets management + +- **Never pass secrets as command-line arguments** (they appear in process listings) +- Use environment variables or a secrets manager for `SENTINEL_API_KEY`, `SENTINEL_DATABASE_URL`, `SENTINEL_OAUTH_JWT_SECRET`, and `SENTINEL_OAUTH_TOKENS_JSON` +- The API key is compared using constant-time comparison to prevent timing side channels +- Database connection strings should use SSL in production -## Operational controls +## Network security -- Run release gates before production rollout. -- Keep backups for Postgres. -- Monitor health and error rates continuously. +- **TLS**: Terminate TLS at a reverse proxy (nginx, Caddy, cloud load balancer). All client-to-API traffic should use HTTPS. +- **Admin endpoint isolation**: Admin endpoints (`/admin/*`, `/internal/*`) should not be exposed to the public internet. Use network-level access controls or run admin endpoints on a separate port/service. +- **Database**: Use SSL for Postgres connections. Restrict database access to the API server's network. +- **Redis**: If using Redis for distributed rate limiting, ensure the Redis instance is not publicly accessible.