diff --git a/compliance-practices/sprint_compliance_log.md b/compliance-practices/sprint_compliance_log.md index 94fcc85..99c8659 100644 --- a/compliance-practices/sprint_compliance_log.md +++ b/compliance-practices/sprint_compliance_log.md @@ -1,6 +1,6 @@ # Sprint-by-Sprint Compliance Log -**Last Updated**: 2026-03-18 +**Last Updated**: 2026-03-22 **Purpose**: Track what compliance-relevant measures were implemented in each sprint --- @@ -112,17 +112,40 @@ Sprint 6 was the **compliance hardening sprint**. The CI pipeline gained three c --- -## Sprint 7: Intelligence Layer (Apr 1-14, 2026) — PLANNED +## Sprint 7: Intelligence Layer (Mar 14-18, 2026) — COMPLETE + +### Compliance Measures Implemented + +| Measure | Regulation | Description | +|---------|-----------|-------------| +| Compliance practices documentation | All | 7 documents in `compliance-practices/` covering DSGVO, CRA, EU AI Act | +| Recursive completeness audit | CRA Art. 13 | Task 7.29 PASSED — all code tested, documented, gap-free | +| Vision alignment check | CRA Art. 10(6) | Sprints 1-7 verified on-mission — no drift detected | +| Updated execution plan | CRA Art. 13 | Sprint 6 marked complete, Sprint 7 current, architecture updated | +| CHANGELOG update | CRA Art. 13 | v2026.03.1 release notes with all Sprint 7 features | +| 62 new tests (204 total) | CRA Annex I | Event clustering, burst detection, risk scoring, events API tests | +| Risk scoring interpretability | EU AI Act Art. 50 | 4-signal decomposition; analysts see which signals drive each risk score | + +### Key Decision +Sprint 7 is the first sprint where all compliance measures were **documented before implementation** via this compliance-practices folder. This folder now serves as a reusable reference for any formal regulatory assessment. + +--- + +## Sprint 8: Topic Mode & Supply Chain Security (Apr 15-28, 2026) — PLANNED ### Planned Compliance Measures | Measure | Regulation | Description | |---------|-----------|-------------| -| Compliance practices documentation | All | This folder — documenting all practices for reuse | -| Recursive completeness audit | CRA Art. 13 | Final sprint task verifies all code tested and documented | -| Vision alignment check | CRA Art. 10(6) | Verify project hasn't drifted from stated purpose | -| Updated execution plan | CRA Art. 13 | Documentation reflects current state | -| CHANGELOG update | CRA Art. 13 | Release notes for v1.7.0 | +| **SBOM generation in CI** | CRA Art. 13(15) | CycloneDX `sbom.json` generated on every push to main/development; formally satisfies SBOM requirement | +| **Dependency vulnerability scanning** | CRA Art. 10(4) | pip-audit in CI; blocks merge on HIGH/CRITICAL CVEs; exceptions require documented review | +| **User-facing AI disclosure panel** | EU AI Act Art. 50 | Persistent panel on every briefing view: names Gemini as AI model, instructs analyst review | +| **Topic Mode transparency** | EU AI Act Art. 50 | Divergence scores are deterministic and inspectable; methodology documented in DEVELOPER.md | +| **Compliance log update** | CRA Art. 13 | This document updated at sprint close | +| **≥18 new tests** | CRA Annex I | Topic pipeline, topic API, NetworkGraph coverage | + +### Key Decision +Sprint 8 completes the CRA Art. 13(15) SBOM requirement that has been "SBOM-ready" since Sprint 6 (pinned requirements.txt). Generating the actual artifact closes the gap between readiness and compliance. --- @@ -141,9 +164,11 @@ Sprint 5 ─── Operational Maturity (metrics, export, documentation) │ Sprint 6 ─── CI Compliance Pipeline (secret scan, docs drift, branch policy, SECURITY.md) │ -Sprint 7 ─── Documentation & Audit (compliance practices, recursive verification) +Sprint 7 ─── Documentation & Audit (compliance practices, recursive verification) ✓ COMPLETE + │ +Sprint 8 ─── SBOM artifact, vulnerability scanning, AI disclosure UI ← CURRENT │ -Sprint 8+ ── SBOM, auth, vulnerability scanning, formal assessment +Sprint 9+ ── User auth, formal assessment, CDDBS-Edge governance artifacts ``` --- @@ -152,10 +177,10 @@ Sprint 8+ ── SBOM, auth, vulnerability scanning, formal assessment | Metric | Value | |--------|-------| -| Sprints with compliance measures | 7/7 (100%) | -| Automated CI compliance checks | 4 (secret scan, docs drift, branch policy, linting) | -| Test count | ~197 (and growing) | -| Documentation pages | 10+ production docs, 12+ sprint docs, 5 blog posts, 7 compliance docs | +| Sprints with compliance measures | 8/8 (100%) | +| Automated CI compliance checks | 4 now, 6 planned (+ SBOM, pip-audit in Sprint 8) | +| Test count | 204 (Sprint 7 complete) | +| Documentation pages | 10+ production docs, 14+ sprint docs, 5 blog posts, 7 compliance docs | | Security-specific files | SECURITY.md, CODEOWNERS, detect_secrets.py, secret-scan.yml | | DSGVO measures | 6 (BYOK, minimization, purpose limitation, no tracking, secret protection, webhook signing) | | CRA measures | 8 (secret scan, docs drift, branch policy, SBOM-ready, SECURITY.md, documentation, version tags, change control) | diff --git a/docs/cddbs_execution_plan.md b/docs/cddbs_execution_plan.md index 200ec64..b794fa0 100644 --- a/docs/cddbs_execution_plan.md +++ b/docs/cddbs_execution_plan.md @@ -3,7 +3,7 @@ **Project**: Cyber Disinformation Detection Briefing System (CDDBS) **Start Date**: February 3, 2026 **Delivery Model**: 2-week sprints -**Last Updated**: 2026-03-18 +**Last Updated**: 2026-03-22 --- @@ -90,35 +90,41 @@ CDDBS is a system for analyzing media outlets and social media accounts for pote - **Compliance**: Major compliance sprint — secret scanning CI, docs drift detection, branch policy, SECURITY.md, CODEOWNERS - See [docs/sprint_6_backlog.md](sprint_6_backlog.md) for details -### Sprint 7: Intelligence Layer & Compliance Hardening (Apr 1-14, 2026) — CURRENT -**Target**: v1.7.0 | **Status**: Planning +### Sprint 7: Intelligence Layer & Compliance Hardening (Mar 14-18, 2026) — COMPLETE +**Target**: v1.7.0 | **Status**: Done -- TF-IDF event clustering pipeline (agglomerative clustering) -- Z-score burst detection on keyword frequency +- TF-IDF event clustering pipeline (agglomerative clustering, distance_threshold=0.6) +- Z-score burst detection (24h baseline, 1h window, threshold=3.0) - Narrative risk scoring (4-signal composite: source concentration, burst magnitude, timing sync, narrative match) - `/events` API endpoints (list, detail, map, bursts) -- Frontend: EventClusterPanel, BurstTimeline, EventDetailDialog -- Enhanced GlobalMap with event cluster markers -- Compliance practices documentation (DSGVO, CRA, EU AI Act) -- Recursive completeness audit (verify all Sprint 7 work implemented, tested, documented) -- Vision alignment check (Sprints 1-7 against project mission) +- Frontend: EventClusterPanel, BurstTimeline, EventDetailDialog, enhanced GlobalMap +- Compliance practices documentation (7 documents: DSGVO, CRA, EU AI Act) +- Recursive completeness audit PASSED — 204 tests, all CI green - **Compliance**: Full compliance documentation folder, recursive audit, vision alignment verification -- See [docs/sprint_7_backlog.md](sprint_7_backlog.md) for details +- See [docs/sprint_7_backlog.md](sprint_7_backlog.md) | [retrospectives/sprint_7.md](../retrospectives/sprint_7.md) -### Sprint 8: Collaborative Features & SBOM (Apr-May 2026) -- User authentication and authorization +### Sprint 8: Topic Mode & Supply Chain Security (Apr 15-28, 2026) — CURRENT +**Target**: v1.8.0 | **Status**: Planning + +- **Topic Mode**: Topic-centric multi-outlet comparative analysis (divergence scoring, amplification detection, outlet ranking) +- **NetworkGraph.tsx**: Outlet relationship graph — carried from Sprint 5→6→7 +- **SBOM generation in CI**: CycloneDX format on every release build +- **Dependency vulnerability scanning**: pip-audit in CI, blocks on HIGH/CRITICAL CVEs +- **User-facing AI disclosure panel**: EU AI Act Art. 50 compliance at UI layer +- **Compliance**: SBOM artifact, vulnerability scanning, AI disclosure, compliance log update +- See [docs/sprint_8_backlog.md](sprint_8_backlog.md) for details + +### Sprint 9: User Authentication & Collaboration (May-Jun 2026) +- User authentication and authorization (JWT, role model) - Shared analysis workspaces - Analyst annotations and comments on briefings -- Formal SBOM generation in CI (CycloneDX/SPDX) -- Automated dependency vulnerability scanning -- User-facing AI disclosure in frontend UI +- CDDBS-Edge Phase 0: Swap Gemini → Ollama, benchmark briefing quality -### Sprints 9-12: Advanced Features (May-Jul 2026) +### Sprints 10-12: Advanced Features (Jun-Aug 2026) - Machine learning model fine-tuning - Automated monitoring schedules - API for third-party integration - Multi-language support -- NetworkGraph.tsx production implementation - Currents API collector integration --- @@ -167,12 +173,17 @@ Demonstrates resilience, digital sovereignty, access equity, and privacy-preserv - Batch analysis and export (JSON/CSV/PDF) - Operational metrics and trend endpoints -### Target Architecture (v1.7.0+) -- Event clustering and burst detection (Sprint 7) -- Narrative risk scoring composite (Sprint 7) -- Events API and frontend visualization (Sprint 7) -- User authentication (Sprint 8) -- SBOM and vulnerability scanning (Sprint 8) +### Achieved Architecture (v1.7.0) +- Event clustering and burst detection (TF-IDF agglomerative + z-score) +- Narrative risk scoring composite (4-signal: source_concentration, burst_magnitude, timing_sync, narrative_match) +- Events API and frontend visualization (EventClusterPanel, BurstTimeline, GlobalMap overlay) +- 204 tests, 3 CI workflows, 7 compliance documents + +### Target Architecture (v1.8.0+) +- Topic Mode: topic-centric outlet discovery and divergence scoring (Sprint 8) +- Outlet relationship NetworkGraph (Sprint 8) +- SBOM generation and dependency vulnerability scanning in CI (Sprint 8) +- User authentication and shared workspaces (Sprint 9) --- @@ -198,7 +209,7 @@ Production code flows through the `development` branch as a staging/integration --- -## Vision Alignment Check (as of Sprint 7 Planning) +## Vision Alignment Check (as of Sprint 8 Planning) | Sprint | Contribution to Vision | On Track? | |--------|----------------------|-----------| @@ -208,13 +219,14 @@ Production code flows through the `development` branch as a staging/integration | 4 | Production integration — making research usable | Yes | | 5 | Operational maturity — production-grade features | Yes | | 6 | Event intelligence — proactive monitoring capability | Yes | -| 7 | Intelligence layer — automated event detection | Yes | +| 7 | Intelligence layer — automated event detection | Yes ✓ | +| 8 | Topic Mode — proactive outlet discovery by narrative divergence | Yes | -**Drift assessment**: No significant drift from project vision. All sprints serve the core mission of "analyzing media outlets and social media accounts for potential disinformation activity." The addition of event intelligence (Sprints 6-7) expands the system from reactive (analyst-initiated analysis) to proactive (automated event detection), which is a natural evolution of the core mission. +**Drift assessment**: No significant drift from project vision. All sprints serve the core mission of "analyzing media outlets and social media accounts for potential disinformation activity." Sprint 8's Topic Mode is a direct expression of the mission: given a topic, automatically discover which outlets diverge from neutral coverage — operationally more powerful than waiting for an analyst to know which outlet to analyze. **Potential drift risks**: - CDDBS-Edge is a parallel track that could divert focus — mitigated by keeping it separate and experiment-phase only -- Collaborative features (Sprint 8) could drift toward general-purpose workspace — must stay focused on analyst collaboration for disinformation analysis +- Collaborative features (Sprint 9) could drift toward general-purpose workspace — must stay focused on analyst collaboration for disinformation analysis - Compliance documentation is valuable but must not become the primary focus — it supports engineering quality, not the other way around --- diff --git a/docs/sprint_8_backlog.md b/docs/sprint_8_backlog.md new file mode 100644 index 0000000..d9d7d77 --- /dev/null +++ b/docs/sprint_8_backlog.md @@ -0,0 +1,209 @@ +# Sprint 8 Backlog — Topic Mode & Supply Chain Security + +**Sprint**: 8 (Apr 15 – Apr 28, 2026) +**Target**: v1.8.0 +**Status**: Planning +**Related**: [Sprint 7 Retrospective](../retrospectives/sprint_7.md) | [Execution Plan](cddbs_execution_plan.md) +**Branch Policy**: Production work branches from `development`, not `main` + +--- + +## Sprint Goals + +1. **Topic Mode** — Topic-centric comparative analysis: given a topic, discover which outlets cover it, score divergence from a neutral baseline, and rank by amplification signal +2. **NetworkGraph.tsx** — Ship the outlet relationship graph that has been carried since Sprint 5 +3. **SBOM in CI** — Formal Software Bill of Materials (CycloneDX) generated on every release build +4. **Dependency Vulnerability Scanning** — pip-audit integrated into CI; blocks merge on known CVEs +5. **User-facing AI Disclosure** — Visible panel in frontend communicating AI-generated content to end users per EU AI Act Art. 50 +6. **Recursive Completeness Check** — Final sprint step verifying all tasks implemented, tested, documented, and gap-free + +--- + +## P0 — Topic Mode Backend + +Architecture fully designed in `TOPIC_MODE_PLAN.md`. Implementation follows the step-by-step order defined there. + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.1 | `TopicRun` + `TopicOutletResult` ORM models | S | — | Models added to `models.py`; `init_db()` creates tables automatically; `topic_run_id` FK with cascade delete | +| 8.2 | `topic_prompt_templates.py` | S | — | `get_baseline_prompt()` → JSON `{baseline_summary, key_facts, neutral_framing}`; `get_comparative_prompt()` → JSON `{divergence_score, amplification_signal, propaganda_techniques, framing_summary, divergence_explanation}`; includes STRICT RULES block | +| 8.3 | `pipeline/topic_pipeline.py` | L | — | 5-step pipeline: (1) baseline fetch from 4 reference outlets via SerpAPI, (2) Gemini baseline call, (3) broad discovery (top-N domains by frequency), (4) per-outlet comparative analysis with incremental DB commits, (5) finalize status | +| 8.4 | `POST /topic-runs` API endpoint | S | — | Creates TopicRun, fires background task, returns `{id, status}`; validates: topic non-empty, 1 ≤ num_outlets ≤ 20 | +| 8.5 | `GET /topic-runs` API endpoint | S | — | Returns list ordered by created_at DESC; includes outlet_results count per run | +| 8.6 | `GET /topic-runs/{id}` API endpoint | S | — | Returns TopicRun + outlet_results ordered by divergence_score DESC | + +--- + +## P0 — Topic Mode Frontend + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.7 | `api.ts` additions | S | — | `CreateTopicRunPayload`, `TopicRunStatus`, `TopicOutletResult`, `TopicRunDetail` interfaces; `createTopicRun()`, `fetchTopicRuns()`, `fetchTopicRun()` functions | +| 8.8 | `NewAnalysisDialog.tsx` mode toggle | M | — | `ToggleButtonGroup` at top ("Outlet" / "Topic"); Topic form: Topic text field + num_outlets + time period; calls correct API based on mode; `onCreated` routes to correct list | +| 8.9 | `TopicRunsTable.tsx` | M | — | Mirrors `RunsTable.tsx`; columns: Topic, Outlets Found, Status, Created, Actions; links to detail view | +| 8.10 | `TopicRunDetail.tsx` | M | — | Baseline reference box; outlet cards ranked by divergence_score; score bar (0-100), amplification chip (low=green, medium=yellow, high=red), propaganda technique chips, framing summary, article links; auto-polls while status="running" | +| 8.11 | `App.tsx` integration | M | — | `"topic-runs"` added to `ViewType`; sidebar nav item (`TravelExploreIcon`); topic runs query; routing logic; `NewAnalysisDialog` receives `onCreated` that refreshes correct list | + +--- + +## P1 — NetworkGraph.tsx (Carried from Sprint 5→6→7) + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.12 | `NetworkGraph.tsx` production implementation | M | — | Renders outlet relationship graph from existing data; nodes = outlets, edges = shared narrative matches or cross-references; uses D3 or existing MUI charting; integrated into MonitoringDashboard or dedicated view | + +--- + +## P1 — Supply Chain Security (CI) + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.13 | SBOM generation — `sbom.yml` CI workflow | M | — | Runs `cyclonedx-py requirements` on `requirements.txt`; uploads `sbom.json` as workflow artifact on every push to `main` and `development`; artifact retained 90 days | +| 8.14 | Dependency vulnerability scanning — pip-audit in CI | S | — | `pip-audit -r requirements.txt` runs in `ci.yml` after install; workflow fails on any known CVE with severity ≥ HIGH; exceptions documented in `vulnerability-exceptions.txt` | +| 8.15 | Add `cyclonedx-bom` and `pip-audit` to dev-dependencies | S | — | Added to `requirements.txt` under `# dev/CI tools`; versions pinned | + +--- + +## P1 — User-Facing AI Disclosure + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.16 | `AIDisclosurePanel.tsx` | S | — | Persistent info panel (collapsible) visible to all users: states that briefings are AI-generated by Gemini, reviewed by human analyst, and may contain errors; links to methodology documentation | +| 8.17 | Wire disclosure into analysis report view | S | — | Panel appears above every briefing/report; cannot be permanently dismissed (reappears per session); fulfils EU AI Act Art. 50 transparency requirement at the UI layer | + +--- + +## P1 — Testing + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.18 | Topic pipeline tests | M | — | ≥10 tests: baseline fetch mock, Gemini call mock, discovery domain extraction, per-outlet analysis, incremental commit, error handling (failed outlet), finalize on partial success | +| 8.19 | Topic API endpoint tests | M | — | ≥8 tests: create run, list runs, get detail, detail with outlet_results, empty state, 404, validation errors | +| 8.20 | Frontend type-check | S | — | `npm run build` passes with all new components; no TypeScript errors | + +--- + +## P1 — Documentation + +| # | Task | Effort | Owner | Acceptance Criteria | +|---|------|--------|-------|---------------------| +| 8.21 | Update DEVELOPER.md with Sprint 8 features | M | — | New sections: Topic Mode architecture, topic pipeline flow, /topic-runs API, SBOM generation, vulnerability scanning | +| 8.22 | Update CHANGELOG.md | S | — | v1.8.0 release notes with all Sprint 8 features | +| 8.23 | Update execution plan | S | — | Mark Sprint 7 complete, Sprint 8 current; update architecture section | +| 8.24 | Update compliance log | S | — | Sprint 7 compliance marked complete; Sprint 8 SBOM and disclosure measures documented | + +--- + +## P2 — Deferred / Carried Items + +| # | Task | Effort | Notes | +|---|------|--------|-------| +| 8.25 | User authentication and authorization | XL | Pushed to Sprint 9 — foundational but large; auth requires session management, JWT, role model, UI changes | +| 8.26 | Shared analysis workspaces | XL | Depends on auth; Sprint 10 | +| 8.27 | Analyst annotations on briefings | L | Depends on auth; Sprint 10 | +| 8.28 | Currents API collector | S | Low priority; RSS + GDELT coverage sufficient | + +--- + +## FINAL STEP — Recursive Completeness Check (Task 8.29) + +**This task must be executed last, after all other Sprint 8 tasks are marked done.** + +### 8.29 Sprint 8 Recursive Completeness Audit + +#### 8.29.1 Implementation Completeness +- [ ] Every P0 task (8.1–8.11) has corresponding code committed +- [ ] Every P1 task (8.12–8.24) has corresponding code/docs committed +- [ ] No TODO/FIXME/HACK comments left in Sprint 8 code +- [ ] All new files imported/registered where needed + +#### 8.29.2 Test Coverage +- [ ] `pytest tests/ -v` passes — ≥222 total tests (≥18 new Sprint 8 tests) +- [ ] `npm run build` succeeds (frontend type-check) +- [ ] All new API endpoints return expected responses +- [ ] Edge cases tested: failed Gemini call, partial pipeline completion, zero outlets discovered + +#### 8.29.3 Documentation Completeness +- [ ] DEVELOPER.md updated with all Sprint 8 features +- [ ] CHANGELOG.md has v1.8.0 entry +- [ ] SBOM artifact produced and downloadable +- [ ] Sprint 8 retrospective — deferred to sprint close +- [ ] Compliance log updated + +#### 8.29.4 CI/Compliance Verification +- [ ] Lint passes (ruff check clean) +- [ ] pip-audit passes (no HIGH/CRITICAL CVEs unexcepted) +- [ ] SBOM workflow runs and uploads artifact +- [ ] No secrets in committed code +- [ ] Branch policy: PR targets development branch + +#### 8.29.5 Vision Alignment Check (Sprints 1-8) +- [ ] Topic Mode serves core mission: comparative outlet analysis for disinformation detection ✓ +- [ ] SBOM/vulnerability scanning supports CRA compliance ✓ +- [ ] AI disclosure serves EU AI Act Art. 50 ✓ +- [ ] Auth deferral is deliberate — not drift +- [ ] No feature creep away from counter-disinformation mission + +#### 8.29.6 Gap Identification +- [ ] Document any gaps found +- [ ] Confirm Sprint 9 candidates: user auth, shared workspaces +- [ ] Confirm CDDBS-Edge Phase 0 readiness + +--- + +## Acceptance Criteria (Sprint-Level) + +### Topic Mode +- [ ] Analyst can enter a topic ("NATO expansion eastward"), select 5 outlets, and receive ranked divergence scores within 90 seconds +- [ ] Discovery correctly excludes reference outlets (reuters.com, bbc.com, apnews.com, afp.com) +- [ ] Partial results visible in UI before pipeline completes (incremental commits) +- [ ] Each outlet result shows: divergence score (0-100), amplification signal (low/medium/high), propaganda techniques, framing summary, article links + +### Supply Chain Security +- [ ] `sbom.json` artifact downloadable from every CI run on main/development +- [ ] CI fails on merge if `pip-audit` finds HIGH or CRITICAL CVEs without documented exception + +### UI Compliance +- [ ] AI disclosure panel visible on every briefing view without user opt-in +- [ ] Panel text explicitly names Gemini as the AI model and instructs analyst review + +### Quality +- [ ] ≥18 new tests (≥222 total passing) +- [ ] All CI workflows green +- [ ] No documentation drift + +--- + +## Risk Assessment + +| Risk | Mitigation | +|------|-----------| +| Gemini API latency for 5+ outlet pipeline (10+ calls) | Per-outlet incremental commits + UI auto-poll; user sees progress; pipeline doesn't block | +| SerpAPI rate limits during broad discovery (40-result query) | Single broad query per topic run; per-outlet queries = num_outlets; well within free tier | +| pip-audit false positives blocking CI | `vulnerability-exceptions.txt` allows documented, reviewed exceptions; review process in CONTRIBUTING.md | +| SBOM size / CI artifact bloat | CycloneDX JSON for ~30 deps is ~50KB; 90-day retention; no issue | +| NetworkGraph.tsx scope creep | Scope to existing data (outlet narrative matches); no new backend endpoints required | + +--- + +## Tech Stack (Minimal New Dependencies) + +| Package | Purpose | Tier | +|---------|---------|------| +| `cyclonedx-bom` | SBOM generation | Dev/CI only | +| `pip-audit` | Vulnerability scanning | Dev/CI only | + +No new runtime dependencies. Topic Mode uses existing SerpAPI + google-genai SDK + SQLAlchemy. + +--- + +## Definition of Done + +- All P0 and P1 tasks completed and tested +- Recursive completeness check (8.29) executed and all items checked +- CI green on all workflows (ci.yml, branch-policy.yml, secret-scan.yml, sbom.yml) +- DEVELOPER.md and CHANGELOG.md updated +- Sprint 8 retrospective written +- Compliance log updated +- No regression in Sprint 1-7 functionality +- Production patch exported to `patches/sprint8_production_changes.patch` diff --git a/retrospectives/sprint_7.md b/retrospectives/sprint_7.md new file mode 100644 index 0000000..3c2f82c --- /dev/null +++ b/retrospectives/sprint_7.md @@ -0,0 +1,151 @@ +# Sprint 7 Retrospective + +**Sprint**: 7 — Intelligence Layer & Compliance Hardening +**Duration**: March 14–18, 2026 (accelerated — completed ahead of planned Apr 1-14 window) +**Version**: v1.7.0 +**Status**: Complete + +--- + +## Sprint Goal + +Build the intelligence layer on top of Sprint 6's raw article ingestion: TF-IDF event clustering, z-score burst detection, narrative risk scoring, a full `/events` API, and frontend visualization components. Complete and document all DSGVO/CRA/EU AI Act compliance practices. Verify the full sprint 1-7 delivery with a recursive completeness audit. + +--- + +## Delivery Summary + +### Intelligence Pipeline (Backend) + +| Task | Status | Notes | +|------|--------|-------| +| 7.1 TF-IDF event clustering pipeline | Done | `pipeline/event_clustering.py` — agglomerative clustering (distance_threshold=0.6), 6 event type classifiers | +| 7.2 Cluster metadata extraction | Done | Representative title, top-5 keywords, country list, event_type via keyword heuristics | +| 7.3 Z-score burst detection | Done | `pipeline/burst_detection.py` — 24h baseline, 1h window, z-score threshold 3.0, writes NarrativeBurst rows | +| 7.4 Narrative risk scoring | Done | `pipeline/narrative_risk.py` — 4-signal composite (source_concentration, burst_magnitude, timing_sync, narrative_match) | +| 7.5 Background scheduler integration | Done | Runs automatically after each collector cycle via CollectorManager | +| 7.6 known_narratives.json integration | Done | narrative_match signal wired into existing narrative matcher | + +### Events API Endpoints + +| Task | Status | Notes | +|------|--------|-------| +| 7.7 `GET /events` | Done | Filters: type, country, status, min_risk, limit, offset | +| 7.8 `GET /events/{id}` | Done | Full detail with article list, keyword breakdown, source diversity, timeline | +| 7.9 `GET /events/map` | Done | Country-grouped events with risk scores for map visualization | +| 7.10 `GET /events/bursts` | Done | Active NarrativeBurst records with linked cluster info, min_zscore filter | + +### Frontend Intelligence Components + +| Task | Status | Notes | +|------|--------|-------| +| 7.11 `EventClusterPanel.tsx` | Done | Ranked clusters by narrative_risk_score with title, type chip, countries, article count, risk bar | +| 7.12 `BurstTimeline.tsx` | Done | Keyword frequency spikes ranked by z-score, threshold indicator, frequency bars | +| 7.13 `EventDetailDialog.tsx` | Done | Full event detail with articles, source breakdown, publication timeline, 4-signal risk breakdown | +| 7.14 Enhanced `GlobalMap.tsx` | Done | Countries with active events get highlighted borders, tooltips with event count and risk score | +| 7.15 Updated `MonitoringDashboard.tsx` | Done | 4-panel bottom row: Event Clusters, Burst Timeline, Active Narratives, Country Risk Index | +| 7.16 `NarrativeTrendPanel.tsx` burst integration | Done | Connected to burst detection data | + +### Testing + +| Test File | Tests | Coverage | +|-----------|-------|---------| +| `test_event_clustering.py` | 12 | Classification, cluster creation, edge cases, metadata extraction | +| `test_burst_detection.py` | 14 | Z-score, hourly frequencies, spike detection, threshold boundary, edge cases | +| `test_narrative_risk.py` | 23 | All 4 risk signals independently, composite score, edge cases (single source, zero articles) | +| `test_events_api.py` | 13 | List with filters, detail, map grouping, bursts list, pagination, empty states | + +**Sprint 7 new tests**: 62 +**Total passing**: 204 + +### Documentation & Compliance + +| Task | Status | Notes | +|------|--------|-------| +| 7.22 DEVELOPER.md update | Done | New sections for event clustering, burst detection, risk scoring, /events endpoints | +| 7.23 CHANGELOG.md | Done | v2026.03.1 release notes | +| 7.24 Sprint 7 integration log | Done | Covered in CHANGELOG.md | +| 7.25 Compliance practices documentation | Done | 7 documents in `compliance-practices/` | +| 7.26 Execution plan update | Done | Sprint 6 marked complete, Sprint 7 current | +| 7.29 Recursive completeness audit | Done | **PASSED** 2026-03-18 | + +### Deferred Items + +| Task | Notes | +|------|-------| +| 7.27 NetworkGraph.tsx production implementation | Carried Sprint 5→6→7→8; will be Sprint 8 task | +| 7.28 Currents API collector | Low priority; RSS + GDELT coverage sufficient | +| Sprint 7 retrospective | This document — deferred to sprint close | + +--- + +## Key Metrics + +- **New API endpoints**: 4 (`/events`, `/events/{id}`, `/events/map`, `/events/bursts`) +- **New pipeline modules**: 3 (`event_clustering.py`, `burst_detection.py`, `narrative_risk.py`) +- **New frontend components**: 3 (`EventClusterPanel.tsx`, `BurstTimeline.tsx`, `EventDetailDialog.tsx`) +- **Frontend components enhanced**: 2 (`GlobalMap.tsx`, `MonitoringDashboard.tsx`) +- **New tests**: 62 across 4 files +- **Total test suite**: 204 tests passing +- **Compliance documents produced**: 7 (`compliance-practices/` folder) +- **New dependencies**: 0 — scikit-learn and scipy already added in Sprint 6 +- **Audit status**: PASSED (all 6 audit categories clear) + +--- + +## What Went Well + +1. **Zero new dependencies** — All Sprint 7 ML capabilities (TF-IDF clustering, z-score, cosine similarity) used scikit-learn and scipy added in Sprint 6. Docker image size unchanged, no new attack surface. +2. **Thorough risk scoring decomposition** — 4-signal composite (source_concentration, burst_magnitude, timing_sync, narrative_match) is independently interpretable. Analysts can see which signals drove a risk score, not just the final number. +3. **Recursive completeness audit as a discipline** — Task 7.29 made it operationally impossible to close the sprint with undocumented work. The audit caught the retrospective gap and deferred it explicitly rather than silently skipping it. +4. **Test depth on narrative risk** — 23 tests for narrative_risk.py covering every signal independently, composite score, and edge cases (zero articles, single source). This is the most analytically critical module and got the most coverage. +5. **Compliance documentation now reusable** — The 7 compliance practice documents in `compliance-practices/` are not sprint-specific; they document the architecture decisions that can be referenced in a formal assessment (CRA enforcement summer 2026). + +--- + +## What Could Be Improved + +- [ ] **NetworkGraph.tsx** — Carried for three consecutive sprints (5→6→7→8). Needs a dedicated, unambiguous Sprint 8 slot or a decision to drop it permanently. +- [ ] **EventClusterPanel and BurstTimeline not end-to-end tested with real cluster data** — Unit tests use mock DB state. Integration testing requires collectors to have run 24-48h to populate meaningful clusters. This should be validated in a staging environment before v1.7.0 is considered production-ready. +- [ ] **Z-score threshold discrepancy** — Backlog specified default 2.5, implementation used 3.0. Configurable via environment variable, but the discrepancy between spec and implementation should be documented. +- [ ] **Sprint patches not tracked as formal artifacts** — `patches/sprint7_production_changes.patch` was planned but not confirmed. Patch export should be automated in CI rather than a manual post-sprint step. + +--- + +## Architecture Decisions + +| Decision | Choice | Rationale | +|----------|--------|-----------| +| Clustering algorithm | Agglomerative (distance_threshold=0.6) | No need to pre-specify cluster count; threshold approach handles variable article volumes | +| Burst detection window | 24h baseline / 1h current | Long baseline smooths weekend/timezone effects; 1h window catches emerging spikes early | +| Risk score design | 0-1 composite, 4 additive signals | Interpretable components; no black-box score | +| Pipeline trigger | After each collector cycle | Avoids a separate scheduler; uses existing CollectorManager lifecycle | +| Event type classification | Keyword heuristics (6 types) | Lightweight, inspectable, no additional model call; suitable for MVP intelligence layer | + +--- + +## Vision Alignment + +Sprint 7 completes the transition from **reactive** (analyst-initiated analysis) to **proactive** (automated event detection): + +- Sprints 1-5: Analyst submits a URL → system produces a briefing +- Sprint 6: System ingests articles continuously from 15+ feeds +- Sprint 7: System clusters articles into events, detects narrative bursts, scores risk — without analyst input + +This evolution is on-mission. The core purpose — "analyzing media outlets and social media accounts for potential disinformation activity" — is served by surfacing emerging narrative clusters before an analyst knows to ask about them. + +**No feature creep detected.** All Sprint 7 components serve intelligence analysis for disinformation detection. + +--- + +## Action Items for Sprint 8 + +| Action | Priority | +|--------|----------| +| Implement Topic Mode (topic-centric multi-outlet comparative analysis) | Critical | +| Implement NetworkGraph.tsx production visualization | High | +| Generate SBOM in CI (CycloneDX format) | High | +| Automated dependency vulnerability scanning (pip-audit in CI) | High | +| User-facing AI disclosure panel in frontend | Medium | +| Validate event clustering and burst detection with 48h of live collector data | Medium | +| Currents API collector | Low |