Project: Cyber Disinformation Detection Briefing System (CDDBS) Start Date: February 3, 2026 Delivery Model: 2-week sprints Last Updated: 2026-03-28
CDDBS is a system for analyzing media outlets and social media accounts for potential disinformation activity. It uses LLM-based analysis (Gemini) to produce structured intelligence briefings assessing source credibility, narrative alignment, and behavioral indicators across multiple platforms (news outlets, Twitter/X, Telegram).
Target: v1.1.0 | Status: Done
- Researched 10 professional intelligence briefing formats
- Designed CDDBS briefing template with 7 mandatory sections
- Created JSON schema (draft-07) for structured output
- System prompt v1.1 with confidence framework and attribution standards
- Frontend mockup with sample RT analysis
- Compliance: BYOK architecture, confidence framework, AI labeling, .gitignore for secrets
Target: v1.2.0 | Status: Done
- Automated quality scorer (7 dimensions, 70 points)
- Known narratives reference dataset (7 categories, 16 narratives)
- Source verification framework for 5 evidence types
- 41 tests (schema validation + quality scoring)
- System prompt v1.2 with narrative detection + self-validation
- Compliance: Deterministic quality rubric (independent of AI), automated testing
Target: v1.3.0 | Status: Done
- Telegram platform analysis and behavioral indicators
- Cross-platform identity correlation framework
- Network analysis enhancement (graph model, community detection)
- Schema v1.2.0 with multi-platform fields and network graph
- Platform adapters (Twitter + Telegram)
- System prompt v1.3 (multi-platform aware)
- API rate limiting design (Twitter v2 + Telegram MTProto)
- 80 tests total (39 new)
- Compliance: Data normalization via adapters, rate limiting respect
Target: v1.4.0 | Status: Done
- Integrated Sprints 1-3 research into live
cddbs-prodapplication - Quality scorer wired into analysis pipeline (7 dimensions, 70 points)
- Narrative matcher running against 18 known narratives post-analysis
- 3 new API endpoints (quality, narratives, narratives DB)
- 3 new database tables (briefings, narrative_matches, feedback)
- Frontend: QualityBadge, QualityRadarChart, NarrativeTags components
- Dashboard metrics: Avg Quality + Narratives Detected
- Unplanned: Feedback system, keyboard shortcuts, cold start handling, skeleton loading
- 56 new tests in production (quality: 23, adapters: 22, narratives: 11)
- Compliance: Controlled research→prod transfer, analyst feedback loop
Target: v0.5.0 | Status: Done
- Twitter API v2 integration (direct account analysis via platform adapter)
- Batch analysis support (multiple outlets in single request)
- Export formats (PDF, JSON, CSV)
- Operational metrics endpoint (
GET /metrics) - Developer documentation (812-line DEVELOPER.md)
- Platform routing in orchestrator (news/twitter with fallback)
- 169 tests total (35 new)
- Compliance: Export for auditing, operational metrics, comprehensive documentation
- See docs/sprint_5_backlog.md for details
Target: v0.6.0 | Status: Done
- Event Intelligence Pipeline: RSS (15 feeds) + GDELT Doc API v2 collectors
- BaseCollector ABC + CollectorManager with async scheduling
- URL deduplication (SHA-256) + Title deduplication (TF-IDF cosine similarity)
- Telegram Bot API integration (wired into pipeline)
- Quality and narrative trend endpoints
- Webhook alerting (HMAC-SHA256 signing, auto-disable)
- CI compliance pipeline: secret scanning, documentation drift detection, branch policy enforcement
- Open-source hardening: CODEOWNERS, SECURITY.md, CONTRIBUTING.md, LICENSE, TROUBLESHOOTING.md
- ~197 tests total (25 new)
- Compliance: Major compliance sprint — secret scanning CI, docs drift detection, branch policy, SECURITY.md, CODEOWNERS
- See docs/sprint_6_backlog.md for details
Target: v0.7.0 | Status: Done
- TF-IDF event clustering pipeline (agglomerative clustering, distance_threshold=0.6)
- Z-score burst detection (24h baseline, 1h window, threshold=3.0)
- Narrative risk scoring (4-signal composite: source concentration, burst magnitude, timing sync, narrative match)
/eventsAPI endpoints (list, detail, map, bursts)- Frontend: EventClusterPanel, BurstTimeline, EventDetailDialog, enhanced GlobalMap
- Compliance practices documentation (7 documents: DSGVO, CRA, EU AI Act)
- Recursive completeness audit PASSED — 204 tests, all CI green
- Compliance: Full compliance documentation folder, recursive audit, vision alignment verification
- See docs/sprint_7_backlog.md | retrospectives/sprint_7.md
Target: v0.8.0 | Status: Done
- Topic Mode: 5-step pipeline (baseline → discovery → per-outlet comparative analysis) with coordination signal detection, key claims/omissions extraction
- OutletNetworkGraph.tsx: Force-directed outlet relationship graph in MonitoringDashboard
- AIProvenanceCard.tsx: Tiered AI disclosure (EU AI Act Art. 50) — model ID, prompt version, quality score, legal text
- SBOM generation in CI: CycloneDX
sbom.ymlon every push to main/development, 90-day artifact retention - Dependency vulnerability scanning: pip-audit in CI, fails on actionable HIGH/CRITICAL CVEs
- GitHub Actions pinned to commit SHAs (GhostAction supply chain mitigation)
- 10 new tests (coordination logic, key claims, API schema, ai_metadata)
- Migration fixes: startup column migrations for Sprint 8 DB schema
- Infrastructure: Cloudflare Workers (frontend + GDELT proxy), Fly.io/Koyeb exploration, keep-alive workflow
- Compliance: SBOM artifact (CRA Art. 13(15)), pip-audit (CRA Art. 10(4)), AI provenance (EU AI Act Art. 50)
- See docs/sprint_8_backlog.md | retrospectives/sprint_8.md
Target: v0.9.0 | Status: Done
- AI Trust Framework: LLM output validation (
output_validator.py), grounding score (TF-IDF claim verification), confidence calibration - Information Security Hardening: CORS fix, rate limiting (slowapi), prompt injection prevention (
input_sanitizer.py), security headers, error sanitization, API key hygiene - Compliance Automation: Machine-readable
/compliance/evidenceendpoint, custom dependency scanner (replaces Dependabot) - OWASP LLM Top 10: LLM01, LLM02, LLM04, LLM06, LLM09 mitigated
- 35 new tests, 249 total
- Compliance: OWASP LLM Top 10 coverage, EU AI Act Art. 9/12/14, CRA security hardening
- Versioning: Adopted semver
0.x.y— retaggedv2026.03→v0.5.0 - See docs/sprint_9_backlog.md for details
- User authentication and authorization (JWT, role model, session management)
- CDDBS-Edge Phase 0: Swap Gemini → Ollama, benchmark briefing quality
- Analyst annotations and comments on briefings
- Shared analysis workspaces (depends on Sprint 10 auth)
- Automated monitoring schedules
- API for third-party integration
- Machine learning model fine-tuning
- Multi-language support
- Currents API collector integration
Status: Concept — Experiment Phase 0 in planning Scope: Separate hardware prototype track, runs parallel to cloud sprints Design doc: research/cddbs_edge_concept.md
"What happens when the cloud goes down, the API gets blocked, or you're a journalist in a country that restricts internet access?"
A portable, offline-capable version of CDDBS that runs entirely on a Raspberry Pi 5 with a local quantized LLM (Phi-3 Mini 3.8B via Ollama), replacing all external API calls. Output delivered via MQTT broker to a connected display (e-ink HAT or external screen — approach TBD by experiment).
Experiment Phases:
- Phase 0 (no hardware): Swap Gemini → Ollama on laptop, benchmark briefing quality vs cloud baseline
- Phase 1: Deploy pipeline on Raspberry Pi 5 (8GB), benchmark speed/RAM/thermal
- Phase 2: Wire MQTT output, prototype display options (e-ink HAT vs MQTT subscriber)
- Phase 3: Design offline data ingestion (USB-based article import or minimal RSS fetch)
Why it matters for AI trust & governance: Demonstrates resilience, digital sovereignty, access equity, and privacy-preserving AI deployment — concrete artifacts for governance discussions that most researchers only address theoretically.
- Backend: FastAPI + uvicorn + slowapi on Render (Docker)
- Frontend: React 18 + TypeScript + MUI 6 + Vite on Cloudflare Workers + Render
- Database: PostgreSQL 15 (Neon managed, 12 tables)
- LLM: Google Gemini 2.5 Flash via google-genai SDK
- Data Sources: SerpAPI Google News, Twitter API v2, GDELT Doc API v2 (Cloudflare proxy), RSS (15 feeds)
- CI: GitHub Actions (7 workflows)
- Source Code: GitHub (cddbs-prod + cddbs-research)
- Structured briefing output validated against JSON Schema v1.2
- 7-dimension quality scoring pipeline (70-point rubric)
- Narrative detection against 50+ known disinformation narratives
- Platform adapters for Twitter + Telegram (both wired into pipeline)
- Multi-source event intelligence pipeline (RSS + GDELT)
- URL + title deduplication (SHA-256 + TF-IDF cosine)
- Webhook alerting with HMAC-SHA256 signing
- CI compliance pipeline (secret scan, docs drift, branch policy)
- Background task processing with auto-polling frontend
- Batch analysis and export (JSON/CSV/PDF)
- Operational metrics and trend endpoints
- Event clustering and burst detection (TF-IDF agglomerative + z-score)
- Narrative risk scoring composite (4-signal: source_concentration, burst_magnitude, timing_sync, narrative_match)
- Events API and frontend visualization (EventClusterPanel, BurstTimeline, GlobalMap overlay)
- 204 tests, 3 CI workflows, 7 compliance documents
- Topic Mode: 5-step pipeline — baseline fetch, Gemini baseline, broad discovery, per-outlet comparative analysis, coordination signal detection
- OutletNetworkGraph: force-directed outlet relationship visualization
- AIProvenanceCard: tiered AI disclosure (model ID, prompt version, quality score, legal text)
- SBOM generation (CycloneDX) and pip-audit vulnerability scanning in CI
- GitHub Actions pinned to commit SHAs (supply chain hardening)
- Infrastructure: Cloudflare Workers (frontend + GDELT proxy), keep-alive workflow
- AI trust framework: output validation, grounding score (TF-IDF claim verification), confidence calibration
- Information security: CORS hardening, rate limiting (slowapi), input sanitization, security headers, error sanitization, API key hygiene
- Compliance automation:
/compliance/evidenceendpoint, custom dependency scanner (replaces Dependabot) - OWASP LLM Top 10: LLM01, LLM02, LLM04, LLM06, LLM09 mitigated
- 249 tests, 7 CI workflows
- User authentication and authorization (JWT, RBAC)
- CDDBS-Edge Phase 0 (Gemini → Ollama swap, benchmark)
- Shared analysis workspaces
- Evidence over speed - Every claim must be traceable to evidence
- Confidence transparency - Always communicate uncertainty honestly
- Reproducibility - Analyses should be reproducible with the same inputs
- Professional standards - Output should meet intelligence community standards
- Cost discipline - Stay within free/low-cost tier limits
- Compliance by design - EU regulatory requirements (DSGVO, CRA, EU AI Act) addressed through engineering practices, not afterthought
| Repository | Branch Policy |
|---|---|
cddbs-prod |
Feature branches from development → merge to development → merge to main |
cddbs-research |
Feature branches from main → merge to main |
Production code flows through the development branch as a staging/integration area before reaching main. This is enforced by CI (branch-policy.yml).
| Sprint | Contribution to Vision | On Track? |
|---|---|---|
| 1 | Briefing format — core intelligence output | Yes |
| 2 | Quality scoring — reliability of AI analysis | Yes |
| 3 | Multi-platform — broader disinformation coverage | Yes |
| 4 | Production integration — making research usable | Yes |
| 5 | Operational maturity — production-grade features | Yes |
| 6 | Event intelligence — proactive monitoring capability | Yes |
| 7 | Intelligence layer — automated event detection | Yes ✓ |
| 8 | Topic Mode, supply chain security, AI provenance — proactive discovery + compliance | Yes ✓ |
| 9 | AI trust, information security, compliance automation — output integrity + platform hardening | Yes ✓ |
Drift assessment: No significant drift from project vision. All sprints serve the core mission of "analyzing media outlets and social media accounts for potential disinformation activity."
Sprint 9 reprioritization note: The original plan placed user authentication in Sprint 9. The Sprint 8 security audit revealed critical gaps (prompt injection, no rate limiting, CORS misconfiguration) that must be resolved before adding auth. Additionally, for a disinformation detection system, AI output trustworthiness (grounding scores, hallucination detection) is more mission-critical than access control. Auth is now Sprint 10 — this is a deliberate sequencing decision, not scope drift. The core features (auth, workspaces, annotations, CDDBS-Edge) remain on the roadmap with unchanged priority.
Potential drift risks:
- CDDBS-Edge is a parallel track that could divert focus — mitigated by keeping it separate and experiment-phase only
- Collaborative features (Sprint 9) could drift toward general-purpose workspace — must stay focused on analyst collaboration for disinformation analysis
- Compliance documentation is valuable but must not become the primary focus — it supports engineering quality, not the other way around
See compliance-practices/ for comprehensive documentation of all DSGVO, CRA, and EU AI Act measures implemented across Sprints 1-7.