From d94baa07ec67192c2920b4ec6feb62e2e3bf6ff7 Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 15 Mar 2026 10:09:22 +0000 Subject: [PATCH 1/2] Correct blog series accuracy against Sprint 1-6 implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Post 1 (Architecture): - Update sprint count: 5 → 6, test count: 169 → 142 - Update DB table count: 7 → 12 (add RawArticle, EventCluster, NarrativeBurst, WebhookConfig added in Sprint 6) - Update API endpoint count: 17 → 34 - Add Sprint 6 tables to schema diagram - Add Part 6 to "What's Coming" series outline Post 2 (Analysis Pipeline): - Fix date filter: tbs=qdr:{period} silently fails on google_news engine; correct implementation uses when:{X}d in query string - Fix async model: threading.Thread → FastAPI BackgroundTasks - Update test count reference: 169 → 142 Post 4 (Multi-Platform): - Fix Telegram status: "interface-only, planned for Sprint 6" → Sprint 6 shipped POST /analysis-runs/telegram with live Telegram Bot API routing in the orchestrator Post 5 (Operational Maturity): - Update DB tables: 7 → 12, API endpoints: 17 → 34, tests: 169 → 142 - Fix batch execution model: threading.Thread → BackgroundTasks - Update "What's Next": Sprint 6 is complete; roadmap now starts at Sprint 7 (event clustering, burst detection) - Update series recap to reference Part 6 (Sprint 6 post) https://claude.ai/code/session_01TX2LYcCvHMHa3JM1wx3R6u --- .../01-architecture-and-threat-model.md | 148 +++++ blog-series/02-the-analysis-pipeline.md | 377 +++++++++++++ .../03-quality-scoring-and-narratives.md | 406 ++++++++++++++ blog-series/04-multi-platform-analysis.md | 365 ++++++++++++ blog-series/05-operational-maturity.md | 523 ++++++++++++++++++ blog-series/README.md | 29 + 6 files changed, 1848 insertions(+) create mode 100644 blog-series/01-architecture-and-threat-model.md create mode 100644 blog-series/02-the-analysis-pipeline.md create mode 100644 blog-series/03-quality-scoring-and-narratives.md create mode 100644 blog-series/04-multi-platform-analysis.md create mode 100644 blog-series/05-operational-maturity.md create mode 100644 blog-series/README.md diff --git a/blog-series/01-architecture-and-threat-model.md b/blog-series/01-architecture-and-threat-model.md new file mode 100644 index 0000000..a6d363e --- /dev/null +++ b/blog-series/01-architecture-and-threat-model.md @@ -0,0 +1,148 @@ +--- +title: "Building CDDBS: An LLM-Powered Disinformation Analysis System — Part 1: Architecture & Threat Model" +published: false +description: "How we designed a system that uses Gemini, SerpAPI, and structured intelligence tradecraft to detect disinformation narratives at scale." +tags: ai, security, python, webdev +series: "Building CDDBS" +--- + +## What is CDDBS? + +CDDBS — the Cyber Disinformation Detection Briefing System — is an open-source intelligence analysis platform that detects disinformation narratives in media outlets and social media accounts. It ingests articles from the web, runs them through a structured LLM analysis pipeline, scores the output for quality, and matches it against a database of 18 known disinformation narratives. + +The result is a professional intelligence briefing — the kind an analyst at a think tank or government agency would write — produced in under a minute. + +This is the first post in a series where I'll walk through the technical architecture, the pipeline internals, the quality assurance system, and the operational infrastructure behind it. This isn't a weekend project write-up. CDDBS has been through six development sprints, 142 tests, and a production deployment on Render. The goal of this series is to show how the pieces fit together — and why we made the decisions we did. + +## The Problem We're Solving + +Disinformation analysis is labor-intensive. A trained analyst reviewing a single media outlet for narrative alignment might spend hours reading articles, cross-referencing known campaigns, and writing up findings with proper attribution. Scale that to dozens of outlets across multiple platforms, and you have a staffing problem that no newsroom or research lab can afford. + +The core question CDDBS answers: **Can an LLM produce analyst-grade intelligence briefings if you give it the right structure, the right evidence, and the right constraints?** + +The answer is yes — with caveats. The LLM (Google Gemini) is powerful at synthesis, but terrible at self-assessment. It will hallucinate confidence levels, fabricate URLs, and present speculation as fact unless you engineer around those failure modes explicitly. That engineering is the subject of this series. + +## Threat Model: What We're Looking For + +CDDBS tracks 18 disinformation narratives organized into 8 categories. These aren't hypothetical — they're drawn from documented campaigns catalogued by organizations like EUvsDisinfo, the Atlantic Council's DFRLab, and Stanford's Internet Observatory. + +Here's the taxonomy: + +| Category | Narratives | Example Keywords | +|----------|-----------|------------------| +| Anti-NATO / Western Alliance | 3 | encirclement, broken promises, Cold War relic | +| Anti-EU / European Instability | 3 | EU collapse, sanctions backfire, Islamization | +| Ukraine Conflict Revisionism | 4 | denazification, Azov, Maidan coup, biolabs | +| Western Hypocrisy | 3 | Western propaganda, Guantanamo, election fraud | +| Global South Appeals | 1 | BRICS, multipolar world, anti-colonial | +| Health Disinformation | 1 | bioweapon, Big Pharma | +| Election Interference Denial | 1 | Russiagate hoax, Steele dossier | +| Telegram Amplification | 2 | forwarding chains, censorship refugee | + +Each narrative has a unique ID (e.g., `ukraine_001`), a set of detection keywords, and metadata about propagation patterns. The system doesn't just flag keywords — it counts hits, calculates a confidence level (high/moderate/low based on match density), and deduplicates across the full report text and individual articles. + +This is deliberately a **signature-based** approach, not ML-based. Keyword matching is deterministic, auditable, and runs offline without a model. That matters when your users are analysts who need to explain *why* a match was flagged. + +## Architecture Overview + +CDDBS is a three-tier application: + +``` +┌─────────────────────────────┐ +│ React 18 + TypeScript │ +│ MUI 6 / TanStack Query │ +│ Vite (dev) / Nginx (prod) │ +└──────────────┬──────────────┘ + │ HTTP +┌──────────────▼──────────────┐ +│ FastAPI + SQLAlchemy │ +│ Background task pipeline │ +│ Quality scorer + Narratives│ +└──────────────┬──────────────┘ + │ +┌──────────────▼──────────────┐ +│ PostgreSQL (12 tables) │ +│ Neon managed (production) │ +└─────────────────────────────┘ + │ + ┌──────────┴──────────┐ + ▼ ▼ + SerpAPI Google Gemini + (article fetch) (LLM analysis) +``` + +The backend is a FastAPI application with 34 endpoints. The frontend is a React SPA with Redux Toolkit for state and TanStack React Query for data fetching. PostgreSQL stores everything — reports, articles, quality scores, narrative matches, and tester feedback. + +### The BYOK Model + +CDDBS uses a "Bring Your Own Key" authentication model. Users supply their own SerpAPI and Google Gemini API keys, stored in the browser's `localStorage`. Keys are sent with each analysis request and never persisted server-side. + +This is a deliberate architectural choice: + +- **Cost**: We don't pay for API calls. Users operate within their own quotas. +- **Privacy**: No key management, rotation, or breach surface on our end. +- **Simplicity**: No auth layer, no user accounts, no billing. + +The trade-off is onboarding friction — users need their own API keys before they can run analyses. For a tool aimed at researchers and analysts, that's an acceptable gate. + +## The Pipeline at 30,000 Feet + +When a user clicks "New Analysis" and submits an outlet, the system executes a 6-stage pipeline: + +``` +1. Article Fetch → SerpAPI Google News (or Twitter API v2) +2. LLM Analysis → Gemini 2.5 Flash with structured system prompt +3. Persistence → Report + Articles stored in PostgreSQL +4. Quality Scoring → 7-dimension rubric, 70-point scale +5. Narrative Match → Keyword detection against 18 known narratives +6. Result Assembly → Briefing + scorecard + matches committed to DB +``` + +Critically, this pipeline runs **asynchronously**. The `POST /analysis-runs` endpoint returns immediately with a report ID and `"status": "queued"`. The actual pipeline runs in a background thread. The frontend polls every 3 seconds until the report is ready. + +Stages 4 and 5 — quality scoring and narrative matching — are wrapped in `try/except` blocks. If they fail, the briefing is still delivered. This is a deliberate design choice: a briefing without a quality score is infinitely more useful than no briefing at all. + +## Database Schema + +Twelve tables store the full lifecycle of an analysis — from article ingestion to briefing delivery and webhook alerting: + +``` +outlets ──< articles >── reports ──< narrative_matches + │ + ├── briefings (1:1) + │ +batches ─────────────────── ┘ (via report_ids JSON) + +topic_runs ──< topic_outlet_results + +raw_articles (multi-source ingestion) +event_clusters (Sprint 6+) +narrative_bursts (Sprint 6+) +webhook_configs + +feedback (standalone) +``` + +The key design decision here is the `Report` ↔ `Briefing` relationship. A `Report` stores the raw LLM response and the final briefing text. A `Briefing` stores the structured quality scorecard. They're 1:1 linked by `report_id`. + +Narrative matches are stored as individual rows — one per detected narrative per report. This makes it trivial to query "which reports matched `ukraine_001`?" across the entire database. + +The `Batch` model (added in Sprint 5) groups multiple reports under a single analysis request. It tracks progress with `completed_count` and `failed_count` fields, and stores linked report IDs in a JSON column. + +Sprint 6 added four more tables: `raw_articles` (multi-source feed ingestion from RSS and GDELT), `event_clusters` and `narrative_bursts` (for event intelligence, populated in Sprint 7+), and `webhook_configs` (for outbound alerting via HMAC-signed webhooks). + +## What's Coming in This Series + +This post covered the *what* and *why*. The next posts go deep on the *how*: + +- **Part 2**: The analysis pipeline in detail — how we fetch articles, construct prompts, parse LLM output, and handle failures. +- **Part 3**: The 7-dimension quality rubric — how we score LLM output without using another LLM. +- **Part 4**: Multi-platform analysis — how Twitter API v2 and Telegram adapters normalize heterogeneous data into a common format. +- **Part 5**: Operational maturity — batch analysis, export formats, metrics, and what it takes to go from "it works on my machine" to "it works in production." +- **Part 6**: Event intelligence at scale — the Sprint 6 multi-source ingestion pipeline (RSS + GDELT), TF-IDF deduplication, and webhook alerting. + +Each post will include real code, real data flows, and real architectural trade-offs. If you're building LLM-powered analysis tools — or any system where LLM output quality matters — the patterns here apply well beyond disinformation detection. + +--- + +*CDDBS is open source: production repository is at [github.com/Be11aMer/cddbs-prod](https://github.com/Be11aMer/cddbs-prod) and the research repository is at [github.com/Be11aMer/cddbs-research](https://github.com/Be11aMer/cddbs-research).* diff --git a/blog-series/02-the-analysis-pipeline.md b/blog-series/02-the-analysis-pipeline.md new file mode 100644 index 0000000..c9a4858 --- /dev/null +++ b/blog-series/02-the-analysis-pipeline.md @@ -0,0 +1,377 @@ +--- +title: "Building CDDBS — Part 2: Inside the Analysis Pipeline" +published: false +description: "A deep dive into how CDDBS fetches articles, constructs prompts, calls Gemini, and parses structured intelligence briefings from LLM output." +tags: ai, python, llm, backend +series: "Building CDDBS" +--- + +## The Pipeline Problem + +Most LLM tutorials show you how to call an API and print the response. Real systems need more. You need to fetch data from external sources, construct prompts that constrain the output format, parse responses that don't always follow your instructions, persist results to a database, and handle every failure mode gracefully — all without blocking the user. + +CDDBS solves this with a 6-stage background pipeline. This post walks through every stage with actual code from the production system. + +## Stage 1: Article Fetch + +When a user requests an analysis of a media outlet, the first thing we need is content to analyze. CDDBS uses SerpAPI's Google News engine to fetch recent articles. + +```python +# src/cddbs/pipeline/fetch.py (simplified) + +# Map short date_filter codes to Google News 'when:' query values +_WHEN_MAP = { + "h": "1h", + "d": "1d", + "w": "7d", + "m": "30d", + "y": "1y", +} + +def fetch_articles(outlet, country, num_articles=3, url=None, + api_key=None, time_period=None): + if not api_key: + return generate_mock_articles(outlet) + + query = f'"{outlet}"' + if url: + clean_url = url.replace("https://", "").replace("http://", "").split("/")[0] + query = f'"{outlet}" site:{clean_url}' + + # google_news engine does NOT support the tbs parameter. + # Date filtering must be done via 'when:' operator in the query string. + if time_period: + when_value = _WHEN_MAP.get(time_period, time_period) + query = f"{query} when:{when_value}" + + params = { + "engine": "google_news", + "q": query, + "gl": normalize_country(country), + "api_key": api_key, + } + + search = GoogleSearch(params) + results = search.get_dict() + return results.get("news_results", []) +``` + +A few things to note: + +**Country normalization.** SerpAPI expects ISO country codes (`us`, `ru`, `gb`), but users type "Russia" or "United States." The `normalize_country()` function maps natural language country names to their codes. Small detail, large UX impact. + +**Date filtering.** The SerpAPI `google_news` engine does **not** support the `tbs` parameter. Passing `tbs=qdr:d` silently fails — articles come back unfiltered. The correct approach is the `when:` query operator embedded directly in the search string: `when:1d` for last 24 hours, `when:7d` for last week. We discovered this through silent failures in early testing and patched it in production. This matters because disinformation campaigns often intensify around specific events — an analyst tracking a narrative spike needs yesterday's articles, not last month's. + +**Mock fallback.** When no API key is configured, the system generates mock articles rather than crashing. This is critical for local development and testing — you don't want your 142 tests to require a live SerpAPI key. + +## Stage 2: Prompt Construction + +This is where most of the engineering lives. A raw LLM call with "analyze these articles for disinformation" produces vague, unstructured prose. CDDBS uses a 263-line system prompt that transforms Gemini into a structured intelligence analyst. + +### The System Prompt (v1.3) + +The system prompt is loaded from a versioned text file: + +```python +# src/cddbs/utils/system_prompt.py +_cached_prompt = None + +def load_system_prompt(): + global _cached_prompt + if _cached_prompt: + return _cached_prompt + + prompt_path = Path(__file__).parent.parent / "data" / "system_prompt_v1.3.txt" + _cached_prompt = prompt_path.read_text() + return _cached_prompt +``` + +Caching matters. This function is called once per analysis run, but in a batch of 5 runs, reading the file 5 times is wasteful. The module-level cache ensures a single read. + +### What the System Prompt Enforces + +The prompt defines an analyst persona and constrains output across several dimensions: + +**7 mandatory sections.** Every briefing must contain: Executive Summary, Key Findings, Subject Profile, Narrative Analysis, Confidence Assessment, Limitations & Caveats, and Methodology. If Gemini omits a section, the quality scorer penalizes it. + +**Evidence typing.** Every claim must be attributed using a typed evidence system: + +``` +[POST] — Specific social media post with URL +[PATTERN] — Behavioral pattern with specific metrics +[NETWORK] — Relationship data with named accounts +[METADATA] — Account metadata (creation date, bio) +[EXTERNAL] — Third-party source with organization name +[FORWARD] — Telegram forwarding chain with source/delay +[CHANNEL_META] — Telegram channel metadata +``` + +This isn't decorative. The evidence type system makes hallucination *auditable*. When the LLM tags something as `[POST]`, an analyst can verify whether that post exists. When it says `[PATTERN]`, the metric must be present — "75% retweet ratio" not "high retweet activity." + +**Attribution language rules.** The prompt explicitly defines what language is permitted: + +``` +"The account posted..." — observed facts +"This is consistent with..." — pattern matching +"This suggests..." — inferences +"We assess with [level]..." — analytical judgments + +FORBIDDEN: "It is clear that", "Obviously", "definitely" +``` + +This eliminates the LLM's natural tendency to express false certainty — which is the single most dangerous failure mode for an intelligence product. + +**Known narrative patterns.** All 18 narrative IDs and their keywords are embedded directly in the system prompt. This gives Gemini a reference frame: it's not guessing what "Ukraine conflict revisionism" looks like; it has specific patterns to match against. + +### User Prompt Construction + +The user prompt is built from the fetched articles: + +```python +# src/cddbs/pipeline/prompt_templates.py (simplified) +def get_consolidated_prompt(outlet, country, articles, url=None): + article_text = "" + for i, article in enumerate(articles, 1): + article_text += f"\n--- Article {i} ---\n" + article_text += f"Title: {article.get('title', 'N/A')}\n" + article_text += f"Source: {article.get('link', 'N/A')}\n" + article_text += f"Snippet: {article.get('snippet', 'N/A')}\n" + if article.get('full_text'): + article_text += f"Full Text: {article['full_text']}\n" + + return f"""Analyze the following media outlet for potential +disinformation patterns: + +Outlet: {outlet} +Country: {country} +URL: {url or 'N/A'} + +Articles collected: +{article_text} + +Produce a structured intelligence briefing following the format +specified in your system instructions.""" +``` + +The prompt is deliberately minimal. All the structural constraints live in the system prompt, which is stable across runs. The user prompt just provides the data. + +## Stage 3: LLM Call and Response Parsing + +The Gemini call is straightforward. The response parsing is not. + +```python +# src/cddbs/pipeline/orchestrator.py (simplified) +def call_gemini(prompt, system_prompt, api_key, model="gemini-2.5-flash"): + client = genai.Client(api_key=api_key) + response = client.models.generate_content( + model=model, + contents=prompt, + config=types.GenerateContentConfig( + system_instruction=system_prompt, + temperature=0.1 + ) + ) + return response.text +``` + +**Temperature 0.1.** We want deterministic output. A temperature of 0 gives identical output for identical input; 0.1 adds just enough variation to avoid repetitive phrasing while keeping the structure stable. + +### The Parsing Problem + +Gemini is asked to return JSON, but it often wraps the response in markdown code blocks, or returns a mix of JSON and prose. The parser handles this with a fallback chain: + +```python +def parse_response(raw_text): + # Try 1: Direct JSON parse + try: + return json.loads(raw_text) + except json.JSONDecodeError: + pass + + # Try 2: Extract from markdown code block + match = re.search(r'```(?:json)?\s*([\s\S]*?)```', raw_text) + if match: + try: + return json.loads(match.group(1)) + except json.JSONDecodeError: + pass + + # Try 3: Find first { ... } block + match = re.search(r'\{[\s\S]*\}', raw_text) + if match: + try: + return json.loads(match.group(0)) + except json.JSONDecodeError: + pass + + # Fallback: return raw text as unstructured briefing + return {"final_briefing": raw_text, "parse_failed": True} +``` + +Three things to note here: + +1. **Never crash on bad output.** The worst case is an unstructured briefing. Still useful — just without individual article analyses. +2. **Markdown stripping.** Gemini loves wrapping JSON in ` ```json ``` ` blocks. The regex handles this transparently. +3. **Greedy JSON extraction.** If the response has prose before and after the JSON, the `\{[\s\S]*\}` regex pulls out the largest JSON object. This handles cases where Gemini adds "Here's the analysis:" before the actual output. + +## Stage 4: Database Persistence + +After parsing, results are persisted in a single transaction: + +```python +# Simplified from orchestrator.py +def persist_results(db, report_id, parsed, articles, outlet): + report = db.query(Report).get(report_id) + report.final_report = parsed.get("final_briefing", "") + report.raw_response = raw_text + report.data = { + "status": "completed", + "articles_analyzed": len(articles), + "analysis_date": datetime.now(UTC).isoformat() + } + + # Persist articles + for article_data in articles: + article = Article( + report_id=report_id, + outlet_id=outlet_record.id, + title=article_data.get("title"), + link=article_data.get("link"), + snippet=article_data.get("snippet") + ) + db.add(article) + + db.commit() +``` + +The raw Gemini response is stored alongside the parsed briefing. This is an audit trail — if the quality scorer flags something unexpected, you can go back and see exactly what the LLM returned before parsing. + +## Stage 5: Quality Scoring + +This stage deserves its own post (Part 3), but the integration point matters here: + +```python +# In the pipeline, after persistence +try: + score_briefing(db, report_id) +except Exception as e: + print(f"Quality scoring failed: {e}") + # Non-fatal — briefing is still delivered +``` + +The `try/except` is load-bearing. Quality scoring is a post-processing step that adds value but isn't required for the core product. If the briefing text is malformed or the scorer has a bug, the analyst still gets their report. + +## Stage 6: Narrative Matching + +Same pattern — non-fatal enrichment: + +```python +try: + match_narratives_from_report(db, report_id) +except Exception as e: + print(f"Narrative matching failed: {e}") +``` + +The narrative matcher reads the full report text, scans for keyword hits across all 18 narratives, deduplicates, and creates `NarrativeMatch` rows. Details in Part 3. + +## The Async Execution Model + +The entire pipeline runs as a FastAPI background task: + +```python +@app.post("/analysis-runs") +def create_analysis_run( + request: RunCreateRequest, + background_tasks: BackgroundTasks, + db=Depends(get_db), +): + # Create placeholder report + report = Report(outlet=request.outlet, country=request.country) + report.data = {"status": "queued"} + db.add(report) + db.commit() + + # Schedule pipeline as a background task + background_tasks.add_task( + _run_analysis_job, + report_id=report.id, + outlet=request.outlet, + country=request.country, + # ... other params + ) + + return {"id": report.id, "status": "queued"} +``` + +The frontend polls for completion: + +```typescript +// Frontend: TanStack React Query with conditional polling +const { data: run } = useQuery({ + queryKey: ["run", runId], + queryFn: () => fetchRun(runId), + refetchInterval: (query) => { + const status = query.state.data?.data?.status; + return status === "completed" || status === "failed" + ? false // stop polling + : 3000; // poll every 3 seconds + } +}); +``` + +Why FastAPI `BackgroundTasks` instead of Celery? Cost discipline. Celery requires a message broker (Redis or RabbitMQ), which means another service to deploy and pay for. FastAPI's built-in `BackgroundTasks` runs the job in the same process after the HTTP response is sent — zero extra infrastructure. For our throughput (single-digit concurrent analyses on Render free tier), this is entirely adequate. If CDDBS needed to handle hundreds of concurrent analyses, we'd switch to a proper task queue. Until then, the simplest solution that works is the right one. + +## Platform Routing + +Sprint 5 added a layer on top of Stage 1 — platform routing. Instead of always fetching from SerpAPI, the pipeline can now route to different data sources: + +```python +def _fetch_for_platform(platform, outlet, country, num_articles, + url, serpapi_key, twitter_bearer_token, + date_filter): + if platform == "twitter": + try: + from src.cddbs.pipeline.twitter_client import ( + fetch_twitter_data, briefing_input_to_articles + ) + briefing_input = fetch_twitter_data( + handle=outlet, + num_posts=num_articles or 10, + bearer_token=twitter_bearer_token + ) + if briefing_input and briefing_input.posts: + return briefing_input_to_articles(briefing_input) + except Exception as e: + print(f"Twitter fetch failed ({e}), falling back") + + # Default: SerpAPI news search + return fetch_articles(outlet, country, num_articles=num_articles, + url=url, api_key=serpapi_key, + time_period=date_filter) +``` + +The fallback is the key pattern. If the Twitter API is down, rate-limited, or misconfigured, the pipeline silently falls back to SerpAPI news search. The analyst gets articles either way — possibly from a different source than requested, but never an empty result. + +## Error Handling Philosophy + +CDDBS follows a consistent error philosophy: **degrade gracefully, never crash.** + +| Stage | Failure Mode | Behavior | +|-------|-------------|----------| +| Article fetch | No API key | Return mock articles | +| Article fetch | API error | Return empty list | +| LLM call | Timeout / error | Report marked "failed" | +| Response parsing | Invalid JSON | Raw text used as briefing | +| Quality scoring | Scorer bug | Skipped, briefing still delivered | +| Narrative matching | Matcher bug | Skipped, briefing still delivered | +| Twitter fetch | Rate limited | Fall back to SerpAPI | + +The only stage that can mark a report as "failed" is the LLM call itself. Everything else degrades. This means analysts almost always get something useful, even when things go wrong. + +## What's Next + +This post covered the data flow — from article fetch to database persistence. The next post goes deep on the quality scoring system: how we evaluate LLM output across 7 dimensions without using another LLM, and how the narrative matcher detects known disinformation patterns using deterministic keyword analysis. + +--- + +*The full pipeline implementation is in [orchestrator.py](https://github.com/Be11aMer/cddbs-prod/blob/main/src/cddbs/pipeline/orchestrator.py). The system prompt is versioned at [system_prompt_v1.3.txt](https://github.com/Be11aMer/cddbs-prod/blob/main/src/cddbs/data/system_prompt_v1.3.txt).* diff --git a/blog-series/03-quality-scoring-and-narratives.md b/blog-series/03-quality-scoring-and-narratives.md new file mode 100644 index 0000000..48c3f63 --- /dev/null +++ b/blog-series/03-quality-scoring-and-narratives.md @@ -0,0 +1,406 @@ +--- +title: "Building CDDBS — Part 3: Scoring LLM Output Without Another LLM" +published: false +description: "How we built a 7-dimension, 70-point quality rubric and a deterministic narrative matcher to evaluate AI-generated intelligence briefings." +tags: ai, python, nlp, security +series: "Building CDDBS" +--- + +## The Quality Problem + +Here's a dirty secret about LLM-powered applications: the hardest part isn't generating output. It's knowing whether the output is good. + +You could use a second LLM to evaluate the first one. Some systems do this — "LLM-as-judge" is a popular pattern. But it has a fundamental flaw for intelligence work: LLMs are confidently wrong in correlated ways. If Gemini hallucinates a claim, GPT-4 reviewing that claim might accept it as plausible because it lacks the same context Gemini lacked. You've just automated the rubber stamp. + +CDDBS takes a different approach: **structural quality scoring**. We don't ask "is this briefing accurate?" (that requires ground truth we don't have). We ask "does this briefing follow the structural rules that make intelligence products trustworthy?" That's a question we can answer deterministically, with zero LLM calls. + +## The 7-Dimension Rubric + +The quality scorer evaluates every briefing across 7 dimensions, each worth 10 points: + +| Dimension | What It Measures | Why It Matters | +|-----------|-----------------|----------------| +| Structural Completeness | All 7 required sections present | Missing sections = incomplete analysis | +| Attribution Quality | Claims linked to typed evidence | Unattributed claims are unverifiable | +| Confidence Signaling | Uncertainty expressed explicitly | False certainty is the #1 failure mode | +| Evidence Presentation | Evidence structured and specific | Vague evidence is useless evidence | +| Analytical Rigor | Sound reasoning, limitations noted | Prevents overreach and tunnel vision | +| Actionability | Findings are useful to an analyst | A briefing nobody can act on has no value | +| Readability | Clear, professional prose | Technical accuracy means nothing if it's unreadable | + +Total: **70 points**. Ratings map to bands: + +``` +60-70 → Excellent +50-59 → Good +40-49 → Acceptable +30-39 → Poor + 0-29 → Failing +``` + +### Why These Dimensions? + +This rubric came from Sprint 1 research. We analyzed briefing formats from 10 professional intelligence organizations: EUvsDisinfo, DFRLab (Atlantic Council), Bellingcat, NATO StratCom COE, Stanford Internet Observatory, Graphika, RAND Corporation, UK DCMS, the Global Engagement Center, and the Oxford Internet Institute. + +Key finding: **only 3 of 10 organizations use explicit confidence signaling in their public outputs.** Per-finding confidence levels — where each claim has its own confidence score — is a CDDBS innovation. The rubric is designed to reward this practice because it's the single most important quality signal for an analyst consuming the briefing. + +## Scoring Implementation + +Let's walk through how each dimension is scored in practice. + +### Structural Completeness (10 points) + +The simplest dimension: does the briefing contain the sections we asked for? + +```python +def score_structural_completeness(briefing_text): + score = 0 + issues = [] + required_sections = [ + "executive summary", "key findings", "subject profile", + "narrative analysis", "confidence assessment", + "limitations", "methodology" + ] + + text_lower = briefing_text.lower() + for section in required_sections: + if section in text_lower: + score += 1 + else: + issues.append(f"Missing section: {section}") + + # Bonus points for structured formatting + if "##" in briefing_text or "**" in briefing_text: + score += min(3, 10 - score) # up to 3 bonus for formatting + + return min(score, 10), issues +``` + +This catches the most common LLM failure: omitting sections. Gemini reliably produces Executive Summary and Key Findings but sometimes drops Limitations or Methodology — the sections that constrain analyst overconfidence. + +### Attribution Quality (10 points) + +This is where the evidence typing system pays off: + +```python +EVIDENCE_TYPES = ["[POST]", "[PATTERN]", "[NETWORK]", "[METADATA]", + "[EXTERNAL]", "[FORWARD]", "[CHANNEL_META]"] + +def score_attribution_quality(briefing_text): + score = 0 + issues = [] + + # Count evidence-typed attributions + evidence_count = sum( + briefing_text.count(etype) for etype in EVIDENCE_TYPES + ) + + if evidence_count >= 8: + score += 4 + elif evidence_count >= 4: + score += 2 + else: + issues.append(f"Only {evidence_count} typed evidence items") + + # Check that findings have evidence + findings = re.findall( + r'(?:finding|key finding)[:\s]*(.*?)(?=\n\n|\n#|$)', + briefing_text, re.IGNORECASE | re.DOTALL + ) + findings_with_evidence = sum( + 1 for f in findings + if any(et in f for et in EVIDENCE_TYPES) + ) + + if findings and findings_with_evidence / len(findings) >= 0.8: + score += 3 + elif findings and findings_with_evidence / len(findings) >= 0.5: + score += 1 + else: + issues.append("Most findings lack typed evidence") + + # Check evidence specificity + if re.search(r'\[PATTERN\].*\d+%', briefing_text): + score += 2 # PATTERN has specific metrics + if re.search(r'\[NETWORK\].*@\w+', briefing_text): + score += 1 # NETWORK names specific accounts + + return min(score, 10), issues +``` + +The rubric rewards *specific* evidence. A `[PATTERN]` tag alone is worth something, but `[PATTERN] 78% of tweets are retweets from state media` is worth more. The regex checks for numbers after PATTERN tags and account names after NETWORK tags. + +### Confidence Signaling (10 points) + +The most important dimension for intelligence work: + +```python +CONFIDENCE_LEVELS = ["high confidence", "moderate confidence", + "low confidence"] + +def score_confidence_signaling(briefing_text): + score = 0 + issues = [] + text_lower = briefing_text.lower() + + # Overall confidence stated + has_overall = any(level in text_lower for level in CONFIDENCE_LEVELS) + if has_overall: + score += 3 + else: + issues.append("No overall confidence level stated") + + # Per-finding confidence + findings_section = extract_section(briefing_text, "key findings") + if findings_section: + confidence_mentions = sum( + findings_section.lower().count(level) + for level in CONFIDENCE_LEVELS + ) + if confidence_mentions >= 3: + score += 3 + elif confidence_mentions >= 1: + score += 1 + + # Confidence factors documented + if "confidence" in text_lower and "factor" in text_lower: + score += 2 + + # No forbidden certainty language + forbidden = ["obviously", "it is clear that", "definitely", + "without a doubt", "undeniably"] + violations = [f for f in forbidden if f in text_lower] + if not violations: + score += 2 + else: + issues.append(f"Forbidden certainty language: {violations}") + + return min(score, 10), issues +``` + +This dimension has a dual mechanism: it rewards explicit uncertainty (confidence levels, factors, caveats) and penalizes false certainty (forbidden phrases). An LLM that says "we assess with moderate confidence" gets full marks. One that says "it is clear that" gets docked. + +## Narrative Matching: The Other Evaluation Layer + +Quality scoring tells you whether the briefing is structurally sound. Narrative matching tells you what it found. + +### The Narrative Database + +CDDBS maintains a JSON file of 18 known disinformation narratives: + +```json +{ + "narratives": [ + { + "id": "ukraine_001", + "name": "Ukraine as Nazi/Fascist State", + "category": "Ukraine Conflict Revisionism", + "keywords": [ + "nazi", "fascist", "azov", "denazification", + "bandera", "neo-nazi", "ultranationalist", + "right sector" + ], + "description": "Claims that Ukraine is controlled by Nazi or fascist elements..." + } + ] +} +``` + +Each narrative has a unique ID, a category, and a keyword list. The keywords are chosen to be specific enough to avoid false positives on general political discussion. + +### The Matching Algorithm + +```python +def match_narratives(text, threshold=2): + narratives = load_known_narratives() + matches = [] + text_lower = text.lower() + + for narrative in narratives: + matched_keywords = [ + kw for kw in narrative["keywords"] + if kw.lower() in text_lower + ] + + if len(matched_keywords) >= threshold: + confidence = ( + "high" if len(matched_keywords) >= 5 + else "moderate" if len(matched_keywords) >= 3 + else "low" + ) + matches.append({ + "narrative_id": narrative["id"], + "narrative_name": narrative["name"], + "category": narrative["category"], + "confidence": confidence, + "matched_keywords": matched_keywords, + "match_count": len(matched_keywords) + }) + + # Deduplicate: keep strongest match per narrative + seen = {} + for match in matches: + nid = match["narrative_id"] + if nid not in seen or match["match_count"] > seen[nid]["match_count"]: + seen[nid] = match + + return list(seen.values()) +``` + +### Why Keyword Matching, Not ML? + +This is a deliberate design choice with real trade-offs: + +**Advantages of keyword matching:** +- Deterministic. Same input always produces same output. +- Auditable. You can see exactly which keywords triggered the match. +- Fast. No model loading, no inference time. Runs in <10ms. +- Offline. No external service dependency. +- Explainable. An analyst can evaluate whether the match is a true positive by reading the keywords. + +**Disadvantages:** +- No contextual understanding. "NATO expansion" in a factual news report about a summit and "NATO expansion" in a conspiracy theory about Russian encirclement produce the same match. +- Keyword coverage. If a narrative evolves to use new language, the keywords need manual updating. +- No semantic similarity. Paraphrases of known narratives won't match. + +For CDDBS's use case, the advantages win. The system is a *tool for analysts*, not a replacement for them. A false positive with an explanation ("matched on: nazi, azov, denazification") is more useful than an ML prediction with a probability score that can't be interrogated. + +### Confidence Calibration + +The threshold system maps keyword density to confidence: + +``` +5+ keywords matched → High confidence +3-4 keywords matched → Moderate confidence +2 keywords matched → Low confidence +1 keyword matched → Below threshold (not reported) +``` + +The minimum threshold of 2 is critical. A single keyword like "NATO" could appear in any geopolitical article. Two keywords from the same narrative — "NATO" + "encirclement" — is a much stronger signal. + +## Putting It Together: A Scored Report + +Here's what the quality + narrative pipeline produces for a hypothetical RT analysis: + +```json +{ + "quality": { + "total_score": 52, + "rating": "Good", + "dimensions": { + "structural_completeness": {"score": 8, "max": 10}, + "attribution_quality": {"score": 7, "max": 10}, + "confidence_signaling": {"score": 8, "max": 10}, + "evidence_presentation": {"score": 7, "max": 10}, + "analytical_rigor": {"score": 8, "max": 10}, + "actionability": {"score": 7, "max": 10}, + "readability": {"score": 7, "max": 10} + } + }, + "narratives": [ + { + "narrative_id": "ukraine_003", + "name": "Western Provocation Caused Conflict", + "category": "Ukraine Conflict Revisionism", + "confidence": "high", + "matched_keywords": ["NATO", "Maidan", "provocation", "coup", "Western"], + "match_count": 5 + }, + { + "narrative_id": "anti_nato_001", + "name": "NATO Expansion Threatens Russia", + "confidence": "moderate", + "matched_keywords": ["NATO expansion", "encirclement", "broken promises"], + "match_count": 3 + } + ] +} +``` + +The analyst sees a 52/70 "Good" rating, knows which dimensions are weak (actionability and readability, both 7/10), and sees two narrative matches with the specific keywords that triggered them. They can then verify: did the articles actually discuss Maidan as a Western-backed coup, or was the keyword match coincidental? + +## Frontend: Making Scores Useful + +Quality scores are only valuable if analysts can interpret them. The frontend renders two key visualizations: + +**Quality Radar Chart.** A custom SVG heptagon (7-sided) showing all dimensions simultaneously. No charting library dependency — just computed SVG paths: + +``` + Structural (8) + ╱ ╲ + Read (7) Attrib (7) + │ │ + Action (7) Confid (8) + │ │ + Rigor (8) Evidence (7) +``` + +**Narrative Tags.** Color-coded pills that expand to show matched keywords: + +``` +[Ukraine Conflict Revisionism] ● High + → NATO, Maidan, provocation, coup, Western + +[Anti-NATO] ● Moderate + → NATO expansion, encirclement, broken promises +``` + +## The Testing Strategy + +Quality scoring has the highest test density in the codebase: 23 tests covering all 7 dimensions plus edge cases. + +```python +# test_quality.py examples +def test_high_quality_briefing(): + """A well-structured briefing should score 50+.""" + score = score_briefing_text(HIGH_QUALITY_FIXTURE) + assert score["total_score"] >= 50 + assert score["rating"] in ("Good", "Excellent") + +def test_minimal_briefing(): + """A briefing with only basic structure should score 25-40.""" + score = score_briefing_text(MINIMAL_FIXTURE) + assert 25 <= score["total_score"] <= 40 + assert score["rating"] in ("Poor", "Acceptable") + +def test_forbidden_language_penalty(): + """Forbidden certainty language should reduce confidence score.""" + text = "It is clear that this account is spreading propaganda." + score = score_confidence_signaling(text) + assert score < 8 # penalty applied +``` + +The tests use fixture files — real briefing examples at different quality levels (high, medium, low, minimal, telegram, cross-platform). This ensures the scorer produces sensible results across the full range of inputs. + +Narrative matching has 11 tests covering keyword thresholds, confidence calibration, deduplication, and edge cases: + +```python +def test_below_threshold(): + """A single keyword should not trigger a match.""" + matches = match_narratives("NATO held a summit today.") + assert len(matches) == 0 + +def test_moderate_confidence(): + """3-4 keywords should produce moderate confidence.""" + text = "NATO expansion and encirclement, with broken promises" + matches = match_narratives(text) + assert matches[0]["confidence"] == "moderate" +``` + +## Design Principle: Evaluate Structure, Not Truth + +The core insight behind CDDBS's quality system is that you can evaluate *process quality* without evaluating *factual accuracy*. A briefing that: + +- Includes all 7 sections +- Attributes every claim to typed evidence +- States confidence levels explicitly +- Acknowledges limitations +- Uses professional language + +...is more likely to be accurate than one that doesn't. Not because structure guarantees truth, but because the structural requirements force the LLM to do the work that produces accurate output. You can't write `[POST] https://twitter.com/...` without having a specific post to reference. You can't write "We assess with moderate confidence" without implicitly acknowledging uncertainty. + +The quality scorer doesn't grade the LLM's homework. It checks whether the LLM showed its work. + +--- + +*Quality scorer implementation: [quality.py](https://github.com/Be11aMer/cddbs-prod/blob/main/src/cddbs/quality.py). Narrative database: [known_narratives.json](https://github.com/Be11aMer/cddbs-prod/blob/main/src/cddbs/data/known_narratives.json).* diff --git a/blog-series/04-multi-platform-analysis.md b/blog-series/04-multi-platform-analysis.md new file mode 100644 index 0000000..c1ad9c5 --- /dev/null +++ b/blog-series/04-multi-platform-analysis.md @@ -0,0 +1,365 @@ +--- +title: "Building CDDBS — Part 4: Multi-Platform Disinformation Detection" +published: false +description: "How we built platform adapters for Twitter and Telegram that normalize heterogeneous social media data into a common analysis format." +tags: ai, python, security, api +series: "Building CDDBS" +--- + +## Why Multiple Platforms Matter + +Disinformation doesn't live on one platform. A narrative might originate on a Telegram channel, get amplified through Twitter retweet networks, and eventually surface in fringe news outlets that look legitimate enough to fool casual readers. If your detection system only watches one platform, you're seeing one act of a three-act play. + +CDDBS was initially built around SerpAPI — a news search engine. That covers the news outlet angle: you give it "RT" and it finds recent RT articles to analyze. But analyzing the articles themselves doesn't tell you about the amplification network *around* those articles. For that, you need platform data. + +Sprint 3 added platform adapter interfaces for Twitter and Telegram. Sprint 5 wired the Twitter adapter into the live pipeline with real API v2 calls. This post covers both: the adapter architecture and the Twitter integration. + +## The Adapter Pattern + +The core challenge is data heterogeneity. A Twitter API v2 response looks nothing like a Telegram Bot API response. Both look nothing like a SerpAPI news result. But the analysis pipeline doesn't care about platform-specific fields — it needs a common format to feed into the LLM prompt. + +CDDBS solves this with platform adapters that normalize data into a `BriefingInput` dataclass: + +```python +# src/cddbs/adapters.py +@dataclass +class PostData: + id: str + text: str + timestamp: str + engagement: dict # likes, retweets, replies, etc. + media_type: str # text, image, video, poll + urls: list + mentions: list + is_repost: bool # retweet or forward + original_source: str # who it was reposted from + raw_data: dict # platform-specific fields preserved + +@dataclass +class BriefingInput: + profile: dict # name, handle, followers, etc. + posts: list # list of PostData + platform: str # "twitter", "telegram" + collection_period: dict + data_source: str # "api_v2", "bot_api", etc. +``` + +Every adapter implements a `normalize()` method that takes raw API data and returns a `BriefingInput`. The pipeline operates exclusively on `BriefingInput` objects — it never touches platform-specific data structures. + +## The Twitter Adapter + +Twitter API v2 returns rich user and tweet data. The adapter extracts what matters for disinformation analysis: + +```python +class TwitterAdapter: + def normalize(self, raw_data): + profile = raw_data.get("profile", {}) + posts = raw_data.get("posts", []) + + normalized_profile = { + "name": profile.get("name"), + "handle": profile.get("username"), + "followers": profile.get("public_metrics", {}).get("followers_count", 0), + "following": profile.get("public_metrics", {}).get("following_count", 0), + "tweet_count": profile.get("public_metrics", {}).get("tweet_count", 0), + "verified": profile.get("verified", False), + "created_at": profile.get("created_at"), + "bio": profile.get("description", "") + } + + normalized_posts = [] + for tweet in posts: + is_repost = bool(tweet.get("referenced_tweets")) + original_source = "" + if is_repost: + ref = tweet["referenced_tweets"][0] + original_source = ref.get("author_username", ref.get("id", "")) + + normalized_posts.append(PostData( + id=tweet.get("id", ""), + text=tweet.get("text", ""), + timestamp=tweet.get("created_at", ""), + engagement={ + "likes": tweet.get("public_metrics", {}).get("like_count", 0), + "retweets": tweet.get("public_metrics", {}).get("retweet_count", 0), + "replies": tweet.get("public_metrics", {}).get("reply_count", 0), + "quotes": tweet.get("public_metrics", {}).get("quote_count", 0), + "impressions": tweet.get("public_metrics", {}).get("impression_count", 0) + }, + media_type=detect_media_type(tweet), + urls=extract_urls(tweet), + mentions=extract_mentions(tweet), + is_repost=is_repost, + original_source=original_source, + raw_data=tweet + )) + + return BriefingInput( + profile=normalized_profile, + posts=normalized_posts, + platform="twitter", + collection_period=raw_data.get("collection_period", {}), + data_source="api_v2" + ) +``` + +Three things the adapter specifically captures for disinformation analysis: + +1. **Retweet detection.** The `referenced_tweets` field tells us if a tweet is original content or amplification. A high retweet ratio (e.g., 80%+ of an account's activity is retweets) is a behavioral indicator of coordinated amplification. + +2. **Engagement ratios.** Impressions vs. likes vs. retweets creates a profile. Accounts with high impressions but very low engagement may be boosted algorithmically or part of a botnet. + +3. **Account metadata.** Creation date, follower/following ratio, bio content, and verification status are all indicators. An unverified account created last month with 50K followers and a bio full of political keywords has a different risk profile than a 10-year-old verified journalist account. + +## The Telegram Adapter + +Telegram presents fundamentally different challenges: + +```python +class TelegramAdapter: + def normalize(self, raw_data): + channel = raw_data.get("channel", {}) + + normalized_profile = { + "name": channel.get("title"), + "handle": channel.get("username"), + "subscribers": channel.get("participants_count", 0), + "channel_type": channel.get("type", "channel"), + "created_at": channel.get("date"), + "description": channel.get("about", ""), + "is_verified": channel.get("verified", False), + "is_scam": channel.get("scam", False) + } + + normalized_posts = [] + for message in raw_data.get("messages", []): + is_forward = "fwd_from" in message + original_source = "" + if is_forward: + fwd = message["fwd_from"] + original_source = ( + fwd.get("from_name") or + fwd.get("channel_post", {}).get("title", "") or + str(fwd.get("from_id", "")) + ) + + normalized_posts.append(PostData( + id=str(message.get("id", "")), + text=message.get("message", ""), + timestamp=message.get("date", ""), + engagement={ + "views": message.get("views", 0), + "forwards": message.get("forwards", 0), + "replies": message.get("replies", {}).get("replies", 0) + }, + media_type=detect_telegram_media(message), + urls=extract_telegram_urls(message), + mentions=extract_telegram_mentions(message), + is_repost=is_forward, + original_source=original_source, + raw_data=message + )) + + return BriefingInput( + profile=normalized_profile, + posts=normalized_posts, + platform="telegram", + collection_period=raw_data.get("collection_period", {}), + data_source="bot_api" + ) +``` + +### Twitter vs. Telegram: Key Differences for Analysis + +| Signal | Twitter | Telegram | +|--------|---------|----------| +| Amplification | Retweets (source hidden from casual view) | Forwards (source channel preserved) | +| Reach metric | Impressions + followers | Views + subscriber count | +| Attribution | Account is always visible | Channel admins can be anonymous | +| Bot detection | Follower/following ratio, creation date | View-to-subscriber ratio, posting frequency | +| Content persistence | Tweets can be deleted retroactively | Messages can be edited/deleted silently | +| Network visibility | Follow graph is partially public | Subscriber lists are private | + +Telegram is actually *better* for attribution in one specific way: forwarded messages preserve the source channel. On Twitter, a retweet chain can obscure the original source. On Telegram, you can trace a forwarding chain back to the originating channel — which is why our threat model includes a "forwarding chain laundering" narrative (`tg_amp_001`). + +## The Twitter API v2 Client + +Sprint 5 added a dedicated Twitter client that calls the API v2 endpoints: + +```python +# src/cddbs/pipeline/twitter_client.py +def fetch_twitter_data(handle, num_posts=10, bearer_token=None): + token = _get_bearer_token(bearer_token) + if not token: + return None + + # Step 1: Resolve handle to user ID + user_data = lookup_user(handle, token) + if not user_data: + return None + + # Step 2: Fetch recent tweets + tweets = fetch_user_tweets(user_data["id"], num_posts, token) + + # Step 3: Normalize via adapter + adapter = TwitterAdapter() + return adapter.normalize({ + "profile": user_data, + "posts": tweets, + "collection_period": { + "start": datetime.now(UTC).isoformat(), + "method": "api_v2_recent" + } + }) +``` + +### Rate Limiting + +Twitter API v2 has aggressive rate limits, especially on the Basic tier (10K tweets/month read). The client implements exponential backoff: + +```python +def _make_request(url, headers, params=None, max_retries=3): + for attempt in range(max_retries + 1): + response = requests.get(url, headers=headers, params=params) + + if response.status_code == 200: + return response.json() + + if response.status_code == 429: # Rate limited + reset_time = int(response.headers.get("x-rate-limit-reset", 0)) + wait = max(reset_time - time.time(), 2 ** attempt) + time.sleep(min(wait, 60)) # cap at 60 seconds + continue + + if response.status_code >= 500: # Server error, retry + time.sleep(2 ** attempt) + continue + + return None # Client error, don't retry + + return None +``` + +The key detail: the `x-rate-limit-reset` header tells you exactly when the rate limit window resets. We use that when available, falling back to exponential backoff (`2^attempt` seconds) when it's not. The 60-second cap prevents absurdly long waits. + +### Bridging to the Pipeline + +The pipeline expects a list of article-like dicts (with `title`, `link`, `snippet` fields). The Twitter client bridges this gap: + +```python +def briefing_input_to_articles(briefing_input): + articles = [] + for post in briefing_input.posts: + articles.append({ + "title": f"Tweet by @{briefing_input.profile.get('handle', 'unknown')}", + "link": f"https://twitter.com/{briefing_input.profile.get('handle')}/status/{post.id}", + "snippet": post.text[:200], + "full_text": post.text, + "date": post.timestamp, + "meta": { + "platform": "twitter", + "engagement": post.engagement, + "is_repost": post.is_repost, + "original_source": post.original_source + } + }) + return articles +``` + +This is an impedance mismatch adapter. The pipeline was originally built for news articles with titles, links, and snippets. Tweets don't have titles. The bridge creates synthetic titles (`"Tweet by @handle"`), constructs URLs from the tweet ID, and truncates the text to a snippet while preserving the full text. + +The `meta` field carries platform-specific data (engagement, repost status) through to the LLM prompt, where the system prompt knows how to interpret Twitter-specific indicators. + +## Platform Routing in the Pipeline + +The orchestrator routes data fetch based on the `platform` parameter: + +```python +def _fetch_for_platform(platform, outlet, country, num_articles, + url, serpapi_key, twitter_bearer_token, + date_filter): + if platform == "twitter": + try: + briefing_input = fetch_twitter_data( + handle=outlet, + num_posts=num_articles or 10, + bearer_token=twitter_bearer_token + ) + if briefing_input and briefing_input.posts: + return briefing_input_to_articles(briefing_input) + except Exception: + pass # Fall through to SerpAPI + + return fetch_articles(outlet, country, + num_articles=num_articles, + url=url, api_key=serpapi_key, + time_period=date_filter) +``` + +The fallback is silent. If the Twitter API returns nothing — bad token, rate limited, account doesn't exist — the pipeline falls back to SerpAPI news search for the same outlet name. This means an analyst who types `@rt_com` with a bad Twitter token still gets an analysis, just from news articles instead of tweets. + +## Use Cases This Enables + +With multi-platform support, CDDBS can address several analysis patterns: + +**Single-outlet deep dive.** Analyze RT's Twitter presence and their news output separately, then compare narrative alignment. Do their tweets push harder on certain narratives than their articles? + +**Cross-platform correlation.** If the same narrative appears in a Telegram channel and a Twitter account within a short time window, that's a signal of coordinated messaging — especially if the Telegram channel is the earlier source. + +**Amplification network mapping.** By analyzing multiple accounts that share content from the same sources, you can identify amplification networks. A batch analysis of 5 Twitter accounts that all retweet the same state media content is more informative than analyzing each one in isolation. + +**Narrative velocity tracking.** How quickly does a narrative move from Telegram (where it might originate) to Twitter (where it gets amplified) to news outlets (where it gains legitimacy)? Multi-platform data makes this measurable. + +## What's Not Built Yet + +Transparency about limitations: + +**Telegram live integration shipped in Sprint 6.** What started as an interface-only adapter is now wired into the live pipeline via `POST /analysis-runs/telegram`. The endpoint accepts a Telegram channel handle and routes it through `TelegramAdapter` in the orchestrator using the Telegram Bot API. The adapter tests (22 tests) cover normalization and forwarding chain attribution; the live endpoint handles channel lookups and message retrieval. + +**Cross-platform identity linking is manual.** The research framework defines 8 signals for linking accounts across platforms (shared URLs, similar bios, posting timing, content overlap, etc.), but automated correlation isn't implemented. An analyst has to manually run analyses on suspected linked accounts and compare the results. + +**No real-time streaming.** Both Twitter and Telegram offer streaming APIs for real-time data. CDDBS currently operates in batch mode — you request an analysis, it fetches recent data, and gives you a report. The Sprint 6 RSS/GDELT ingestion pipeline runs on a schedule (every 3–5 minutes), which is the closest thing to near-real-time monitoring available today. Full streaming is a future capability. + +## The Adapter Test Suite + +Platform adapters have 22 tests — the second-highest coverage area after quality scoring: + +```python +def test_twitter_retweet_detection(): + """Retweets should be detected from referenced_tweets field.""" + tweet = {"referenced_tweets": [{"type": "retweeted", "id": "123"}]} + adapter = TwitterAdapter() + result = adapter.normalize({"profile": {}, "posts": [tweet]}) + assert result.posts[0].is_repost is True + +def test_telegram_forward_attribution(): + """Forwarded messages should preserve source channel.""" + message = { + "fwd_from": {"from_name": "StateMediaChannel"}, + "message": "Breaking news..." + } + adapter = TelegramAdapter() + result = adapter.normalize({"channel": {}, "messages": [message]}) + assert result.posts[0].original_source == "StateMediaChannel" + +def test_cross_platform_normalization(): + """Both adapters should produce compatible BriefingInput objects.""" + twitter_input = TwitterAdapter().normalize(TWITTER_FIXTURE) + telegram_input = TelegramAdapter().normalize(TELEGRAM_FIXTURE) + + # Both should have the same interface + assert hasattr(twitter_input, "profile") + assert hasattr(telegram_input, "profile") + assert isinstance(twitter_input.posts[0], PostData) + assert isinstance(telegram_input.posts[0], PostData) +``` + +The cross-platform normalization test is particularly important: it verifies that downstream code (the pipeline, the quality scorer, the narrative matcher) can process data from any platform without knowing which platform it came from. + +## Next Up + +This post covered the data ingestion layer — how CDDBS gets data from different platforms into a common format for analysis. The final post in this series covers operational maturity: batch analysis, export formats, metrics, and the engineering work that turns a working prototype into a production system. + +--- + +*Platform adapters: [adapters.py](https://github.com/Be11aMer/cddbs-prod/blob/main/src/cddbs/adapters.py). Twitter client: [twitter_client.py](https://github.com/Be11aMer/cddbs-prod/blob/main/src/cddbs/pipeline/twitter_client.py).* diff --git a/blog-series/05-operational-maturity.md b/blog-series/05-operational-maturity.md new file mode 100644 index 0000000..a894f09 --- /dev/null +++ b/blog-series/05-operational-maturity.md @@ -0,0 +1,523 @@ +--- +title: "Building CDDBS — Part 5: From Prototype to Production" +published: false +description: "Batch analysis, export pipelines, operational metrics, and the unglamorous engineering that makes an LLM-powered system actually usable." +tags: ai, python, devops, webdev +series: "Building CDDBS" +--- + +## The Production Gap + +There's a moment in every project where the core feature works but the system isn't usable. The LLM generates good briefings. The quality scorer catches structural issues. The narrative matcher flags known patterns. An analyst can run a single analysis, wait a minute, and get results. + +But then they want to analyze 5 outlets and compare them. Or email a briefing to a colleague as a PDF. Or check whether the system's been producing more failures than usual this week. + +Sprint 5 of CDDBS was about closing these gaps — the features that separate "works on my machine" from "works for the team." This post covers batch analysis, export formats, operational metrics, and the frontend changes that tie them together. + +## Batch Analysis + +### The Problem + +A single CDDBS analysis takes 30-60 seconds (mostly Gemini API latency). An analyst comparing 5 outlets would need to submit 5 separate requests, track 5 separate report IDs, and manually correlate the results. That's a workflow problem. + +### The Design + +We added a `Batch` model that groups multiple analysis runs under a single request: + +```python +class Batch(Base): + __tablename__ = "batches" + + id = Column(Integer, primary_key=True, index=True) + name = Column(String, nullable=True) + status = Column(String, default="queued") + target_count = Column(Integer, default=0) + completed_count = Column(Integer, default=0) + failed_count = Column(Integer, default=0) + report_ids = Column(JSON, default=list) + created_at = Column(DateTime, default=lambda: datetime.now(UTC)) +``` + +The `report_ids` column is a JSON array of Report IDs. Each target in the batch creates its own independent Report record — the Batch just tracks which reports belong together. + +### Why Not a Foreign Key? + +We considered adding `batch_id` as a foreign key on `Report`. The JSON array approach is simpler: + +- **No schema migration** on the existing reports table. +- **Reports are independent.** A report created through a batch is identical to a report created individually. The same API endpoint (`GET /analysis-runs/{id}`) retrieves it. There's no "batch-only" report type. +- **Batch is a view, not a relationship.** The batch tracks progress; it doesn't own the reports. + +The trade-off is that querying "all reports in batch X" requires a JSON contains check instead of a simple FK join. At our scale (batches of 1-5), this is irrelevant. + +### Execution Model + +Each target in a batch gets its own FastAPI `BackgroundTask`: + +```python +@app.post("/analysis-runs/batch") +def create_batch( + request: BatchCreateRequest, + background_tasks: BackgroundTasks, + db=Depends(get_db), +): + batch = Batch( + name=request.name, + status="running", + target_count=len(request.targets) + ) + db.add(batch) + db.commit() + + for target in request.targets: + report = Report(outlet=target.outlet, country=target.country) + report.data = {"status": "queued", "batch_id": batch.id} + db.add(report) + db.commit() + + batch.report_ids = batch.report_ids + [report.id] + db.commit() + + background_tasks.add_task( + _run_analysis_job, + report_id=report.id, + outlet=target.outlet, + country=target.country, + batch_id=batch.id, + ) + + return {"batch_id": batch.id, "target_count": len(request.targets)} +``` + +When each pipeline job completes, it updates the batch counters: + +```python +def _update_batch_progress(batch_id, success, db): + batch = db.query(Batch).get(batch_id) + if not batch: + return + + if success: + batch.completed_count += 1 + else: + batch.failed_count += 1 + + if batch.completed_count + batch.failed_count >= batch.target_count: + batch.status = "completed" if batch.failed_count == 0 else "partial" + + db.commit() +``` + +The batch status transitions: `queued → running → completed` (or `partial` if any target failed). An analyst checking batch progress sees a clear picture: + +```json +GET /analysis-runs/batch/7 +{ + "id": 7, + "name": "Russian state media comparison", + "status": "running", + "target_count": 4, + "completed_count": 2, + "failed_count": 0, + "report_ids": [42, 43, 44, 45] +} +``` + +### Why BackgroundTasks, Not a Task Queue? + +Same reasoning as the single-analysis pipeline: cost discipline. A proper task queue (Celery + Redis) requires two additional services. FastAPI's `BackgroundTasks` runs jobs in-process after the response is returned — zero extra infrastructure. For batches capped at 5 targets on Render's free tier, this is entirely adequate. The `BATCH_MAX_SIZE` config (default 5) prevents resource exhaustion. + +## Export Pipeline + +### Three Formats, One Endpoint + +``` +GET /analysis-runs/{id}/export?format=json +GET /analysis-runs/{id}/export?format=csv +GET /analysis-runs/{id}/export?format=pdf +``` + +Each format serves a different workflow: + +**JSON** — Machine-readable. For analysts who want to feed CDDBS output into their own tools, scripts, or databases. Contains the full briefing, quality scorecard, narrative matches, and article metadata. + +**CSV** — Spreadsheet-compatible. For analysts who work in Excel or Google Sheets. Flattened tabular format with section headers. + +**PDF** — Shareable. For briefings that need to be emailed, printed, or included in a presentation. + +### JSON Export + +The simplest format — a structured dump of everything we know about a report: + +```python +def export_json(report, briefing=None, narratives=None, articles=None): + output = { + "metadata": { + "report_id": report.id, + "outlet": report.outlet, + "country": report.country, + "created_at": report.created_at.isoformat(), + "export_format": "json", + "export_version": "1.0" + }, + "briefing": report.final_report, + "articles": [ + { + "title": a.title, + "link": a.link, + "snippet": a.snippet, + "date": str(a.date) if a.date else None + } + for a in (articles or []) + ] + } + + if briefing: + output["quality"] = { + "score": briefing.quality_score, + "rating": briefing.quality_rating, + "details": briefing.quality_details + } + + if narratives: + output["narratives"] = [ + { + "id": n.narrative_id, + "name": n.narrative_name, + "category": n.category, + "confidence": n.confidence, + "keywords": n.matched_keywords, + "match_count": n.match_count + } + for n in narratives + ] + + return json.dumps(output, indent=2, default=str) +``` + +### CSV Export + +CSV is harder because the data is relational, not tabular. The export flattens it into sections: + +```python +def export_csv(report, briefing=None, narratives=None, articles=None): + output = io.StringIO() + writer = csv.writer(output) + + # Metadata section + writer.writerow(["=== METADATA ==="]) + writer.writerow(["Report ID", report.id]) + writer.writerow(["Outlet", report.outlet]) + writer.writerow(["Country", report.country]) + writer.writerow(["Date", report.created_at.isoformat()]) + + if briefing: + writer.writerow([]) + writer.writerow(["=== QUALITY ==="]) + writer.writerow(["Score", f"{briefing.quality_score}/70"]) + writer.writerow(["Rating", briefing.quality_rating]) + + if narratives: + writer.writerow([]) + writer.writerow(["=== NARRATIVES ==="]) + writer.writerow(["ID", "Name", "Category", "Confidence", "Keywords"]) + for n in narratives: + writer.writerow([ + n.narrative_id, n.narrative_name, n.category, + n.confidence, ", ".join(n.matched_keywords or []) + ]) + + if articles: + writer.writerow([]) + writer.writerow(["=== ARTICLES ==="]) + writer.writerow(["Title", "Link", "Date", "Snippet"]) + for a in articles: + writer.writerow([a.title, a.link, a.date, a.snippet]) + + return output.getvalue() +``` + +The section headers (`=== METADATA ===`) make the CSV human-scannable when opened in a spreadsheet. Each section has its own column structure, which means this isn't a "pure" CSV — but it's more useful than forcing all data into a single column layout. + +### PDF Export + +PDF is the only format that requires an optional dependency — `reportlab`: + +```python +def export_pdf(report, briefing=None, narratives=None, articles=None): + try: + from reportlab.lib.pagesizes import letter + from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer + from reportlab.lib.styles import getSampleStyleSheet + except ImportError: + return None # Graceful degradation + + buffer = io.BytesIO() + doc = SimpleDocTemplate(buffer, pagesize=letter) + styles = getSampleStyleSheet() + story = [] + + # Title + story.append(Paragraph( + f"CDDBS Intelligence Briefing: {report.outlet}", + styles["Title"] + )) + + # Quality badge + if briefing: + story.append(Paragraph( + f"Quality: {briefing.quality_score}/70 ({briefing.quality_rating})", + styles["Heading2"] + )) + + # Briefing content + if report.final_report: + for paragraph in report.final_report.split("\n\n"): + story.append(Paragraph(paragraph, styles["BodyText"])) + story.append(Spacer(1, 6)) + + # ... narratives and articles sections + + doc.build(story) + return buffer.getvalue() +``` + +`reportlab` is declared as an optional dependency. If it's not installed, `export_pdf()` returns `None`, and the API returns a 501 Not Implemented for PDF requests. The JSON and CSV exports work with nothing beyond the Python standard library. + +### Frontend Integration + +The report viewer adds export buttons that link directly to the export endpoint: + +```typescript +// ReportViewDialog.tsx (simplified) +{run?.data?.status === "completed" && ( + <> + + + +)} +``` + +The buttons use plain `` tags with `href` pointing to the export endpoint. This triggers a browser download without any JavaScript fetch/blob handling. Simple, and it works across all browsers. + +## Operational Metrics + +### Why Metrics Matter + +When you're running 10+ analyses a day, you need to know: +- Are analyses succeeding or failing? +- Is output quality trending up or down? +- What's breaking, and how often? + +CDDBS computes metrics on-demand from the database: + +```python +def compute_metrics(db): + reports = db.query(Report).all() + + if not reports: + return { + "total_runs": 0, "completed": 0, "failed": 0, + "running": 0, "success_rate": 0, + "avg_quality_score": 0, + "quality_distribution": {}, + "failure_reasons": [], + "recent_24h": {"total": 0, "completed": 0, + "failed": 0, "success_rate": 0} + } + + completed = [r for r in reports if r.data and r.data.get("status") == "completed"] + failed = [r for r in reports if r.data and r.data.get("status") == "failed"] + running = [r for r in reports if r.data and r.data.get("status") in ("queued", "running")] + + # Quality distribution from briefings + briefings = db.query(Briefing).all() + quality_dist = {"excellent": 0, "good": 0, "acceptable": 0, "poor": 0, "failing": 0} + scores = [] + for b in briefings: + if b.quality_rating: + quality_dist[b.quality_rating.lower()] = quality_dist.get(b.quality_rating.lower(), 0) + 1 + if b.quality_score: + scores.append(b.quality_score) + + # Recent 24h breakdown + cutoff = datetime.now(UTC) - timedelta(hours=24) + recent = [r for r in reports if r.created_at and r.created_at >= cutoff] + recent_completed = [r for r in recent if r.data and r.data.get("status") == "completed"] + recent_failed = [r for r in recent if r.data and r.data.get("status") == "failed"] + + return { + "total_runs": len(reports), + "completed": len(completed), + "failed": len(failed), + "running": len(running), + "success_rate": round(len(completed) / len(reports) * 100, 1) if reports else 0, + "avg_quality_score": round(sum(scores) / len(scores), 1) if scores else 0, + "quality_distribution": quality_dist, + "failure_reasons": [ + r.data.get("errors", ["Unknown"])[0] + for r in failed[-10:] + ], + "recent_24h": { + "total": len(recent), + "completed": len(recent_completed), + "failed": len(recent_failed), + "success_rate": round( + len(recent_completed) / len(recent) * 100, 1 + ) if recent else 0 + } + } +``` + +### Why On-Demand, Not Pre-Aggregated? + +At our scale (low hundreds of reports), computing metrics from raw data on every request is fast enough — under 100ms. Pre-aggregated metrics (materialized views, counter tables) would add complexity: you'd need triggers or background jobs to keep them in sync, and stale aggregates are worse than slightly slow fresh data. + +If CDDBS grew to thousands of reports, we'd add a materialized view refreshed on a schedule. Until then, the query-on-demand approach is correct. + +### What the Metrics Tell You + +A sample metrics response: + +```json +{ + "total_runs": 42, + "completed": 38, + "failed": 2, + "running": 2, + "success_rate": 90.5, + "avg_quality_score": 52.1, + "quality_distribution": { + "excellent": 8, + "good": 15, + "acceptable": 10, + "poor": 4, + "failing": 1 + }, + "failure_reasons": [ + "Gemini API timeout", + "Invalid outlet name" + ], + "recent_24h": { + "total": 8, + "completed": 7, + "failed": 1, + "success_rate": 87.5 + } +} +``` + +The `failure_reasons` array shows the last 10 failure error messages. This is quick diagnostics: if you see "Gemini API timeout" appearing repeatedly, you know to check API quotas. If you see "Invalid outlet name", there's a user input validation gap. + +The `quality_distribution` tells you whether your system prompt needs tuning. If "failing" and "poor" are growing, the LLM is producing structurally deficient output and the system prompt may need revision. + +## The Extended API Status + +The `/api-status` endpoint now reports on all configured services: + +```json +{ + "serpapi_configured": true, + "google_api_configured": true, + "twitter_configured": false, + "database_connected": true, + "version": "1.5.0" +} +``` + +This is operational hygiene. Before an analyst starts an analysis, they can check whether the required API keys are configured. The frontend's `StatusIndicator` component uses this to show green/amber/red status for each service. + +## Testing the Operational Layer + +The operational features added 35 new tests: + +| Test Suite | Count | What It Tests | +|-----------|-------|---------------| +| `test_twitter_client.py` | 14 | User lookup, tweet fetch, rate limiting, adapter bridge | +| `test_batch.py` | 7 | Batch CRUD, validation, progress tracking | +| `test_export.py` | 7 | JSON/CSV/PDF export, missing report handling | +| `test_metrics.py` | 7 | Empty DB, completed/failed states, quality distribution | + +The batch tests mock the pipeline execution to avoid real API calls: + +```python +def test_batch_progress_tracking(client, db): + """Batch counters should update as targets complete.""" + batch = Batch(name="test", target_count=3, status="running") + db.add(batch) + db.commit() + + _update_batch_progress(batch.id, success=True, db=db) + _update_batch_progress(batch.id, success=True, db=db) + _update_batch_progress(batch.id, success=False, db=db) + + db.refresh(batch) + assert batch.completed_count == 2 + assert batch.failed_count == 1 + assert batch.status == "partial" # not all succeeded +``` + +The export tests verify that each format handles edge cases — missing quality data, missing narratives, empty articles: + +```python +def test_export_json_without_quality(db, report): + """JSON export should work even without quality scores.""" + result = export_json(report, briefing=None, narratives=None) + data = json.loads(result) + assert "quality" not in data + assert data["metadata"]["report_id"] == report.id +``` + +## The Full Picture + +After five sprints of operational maturity work (six total in the series), here's where CDDBS stands after Sprint 6: + +| Metric | Value | +|--------|-------| +| Database tables | 12 | +| API endpoints | 34 | +| Tests passing | 142 | +| External dependencies added (Sprints 4-6) | feedparser, httpx, scikit-learn, scipy (+ optional: reportlab) | +| Lines of backend code | ~4,000 | +| Frontend components | 15 | + +The system handles the full lifecycle: ingest data from news, social media, RSS, and GDELT; analyze with a constrained LLM; score output for structural quality; match against known disinformation narratives; export results in three formats; track operational health over time; and fire webhook alerts to external subscribers. + +## What's Next for CDDBS + +Sprint 6 delivered the event intelligence pipeline — multi-source ingestion (RSS + GDELT), TF-IDF deduplication, and webhook alerting (covered in Part 6 of this series). The immediate roadmap ahead: + +- **Sprint 7**: Event clustering (TF-IDF agglomerative), Z-score burst detection for narrative spikes, `EventClusterPanel` and `BurstTimeline` frontend components. +- **Sprint 8**: User authentication, shared analysis workspaces, automated monitoring schedules. +- **Sprint 9+**: ML-based narrative matching (to complement keyword matching), multi-language support, sentence-transformer upgrade for semantic deduplication. + +The long-term vision is a system where an analyst can set up continuous monitoring of 20+ outlets and social media accounts, get alerted when narrative patterns shift, and produce briefings that meet professional intelligence community standards — all powered by LLMs constrained to be honest about what they know and don't know. + +## Series Recap + +This series has covered: + +1. **Architecture & Threat Model** — What CDDBS is, the 18 narratives it tracks, and the three-tier architecture. +2. **The Analysis Pipeline** — Article fetch, prompt construction, LLM call, response parsing, and the async execution model. +3. **Quality Scoring & Narrative Detection** — The 7-dimension rubric, keyword-based narrative matching, and why we evaluate structure instead of truth. +4. **Multi-Platform Analysis** — Twitter and Telegram adapters, platform routing, and the common `BriefingInput` format. +5. **Operational Maturity** — Batch analysis, export formats, metrics, and production engineering (this post). +6. **Event Intelligence at Scale** — Sprint 6: RSS + GDELT ingestion, TF-IDF deduplication, webhook alerting. + +The common thread: **constrain the LLM, verify the output, degrade gracefully.** LLMs are powerful synthesis engines, but they need guardrails — structured prompts, typed evidence, quality rubrics, and narrative databases — to produce output that analysts can trust. Building those guardrails is the actual engineering challenge. The LLM call itself is one line of code. + +--- + +*CDDBS is open source. Production: [github.com/Be11aMer/cddbs-prod](https://github.com/Be11aMer/cddbs-prod). Research: [github.com/Be11aMer/cddbs-research-draft](https://github.com/Be11aMer/cddbs-research-draft).* diff --git a/blog-series/README.md b/blog-series/README.md new file mode 100644 index 0000000..9686f42 --- /dev/null +++ b/blog-series/README.md @@ -0,0 +1,29 @@ +# Building CDDBS — Blog Series + +Technical blog series about the Counter-Disinformation Database Briefing System. +Written for [dev.to](https://dev.to) publication. + +## Posts + +| # | Title | Focus | +|---|-------|-------| +| 1 | [Architecture & Threat Model](01-architecture-and-threat-model.md) | System overview, 18-narrative threat model, BYOK auth, database schema | +| 2 | [Inside the Analysis Pipeline](02-the-analysis-pipeline.md) | Article fetch, prompt engineering, LLM call, JSON parsing, async execution | +| 3 | [Scoring LLM Output Without Another LLM](03-quality-scoring-and-narratives.md) | 7-dimension quality rubric, narrative matching, evidence typing | +| 4 | [Multi-Platform Disinformation Detection](04-multi-platform-analysis.md) | Twitter/Telegram adapters, platform routing, cross-platform analysis | +| 5 | [From Prototype to Production](05-operational-maturity.md) | Batch analysis, export formats, metrics, production engineering | + +## Publishing + +All posts use dev.to frontmatter format. Set `published: true` when ready to publish. + +Posts form a linked series via the `series: "Building CDDBS"` frontmatter field. + +## Future Posts + +As development continues past Sprint 5, additional posts may cover: +- Telegram Bot API live integration +- ML-based narrative matching +- Real-time monitoring and alerting +- Cross-platform identity correlation +- Frontend dashboard and visualization deep dives From 943f2d75399cdc47c5a1b49f52afac4dec285a95 Mon Sep 17 00:00:00 2001 From: Humar Date: Sun, 15 Mar 2026 11:29:53 +0100 Subject: [PATCH 2/2] Update 01-architecture-and-threat-model.md --- blog-series/01-architecture-and-threat-model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog-series/01-architecture-and-threat-model.md b/blog-series/01-architecture-and-threat-model.md index 1491e66..d2767ed 100644 --- a/blog-series/01-architecture-and-threat-model.md +++ b/blog-series/01-architecture-and-threat-model.md @@ -12,7 +12,7 @@ CDDBS — the Cyber Disinformation Detection Briefing System — is an open-sour The result is a professional intelligence briefing — the kind an analyst at a think tank or government agency would write — produced in under a minute. -This is the first post in a series where I'll walk through the technical architecture, the pipeline internals, the quality assurance system, and the operational infrastructure behind it. This isn't a weekend project write-up. CDDBS has been through five development sprints, 169 tests, and a production deployment on Render. The goal of this series is to show how the pieces fit together — and why we made the decisions we did. +TThis is the first post in a series where I'll walk through the technical architecture, the pipeline internals, the quality assurance system, and the operational infrastructure behind it. This isn't a weekend project write-up. CDDBS has been through six development sprints, 142 tests, and a production deployment on Render. The goal of this series is to show how the pieces fit together — and why we made the decisions we did. ## The Problem We're Solving