Skip to content

Detection Engines

John Williams edited this page Mar 9, 2026 · 1 revision

Detection Engines

ARGUS runs four detection engines in parallel on every submitted profile. Each engine is an independent, swappable module that returns a structured result object.


Engine Architecture

Every engine follows the same interface:

async function runEngine(profileData, env) {
  return {
    score: 0-100,        // lower = more suspicious
    reason: string,      // plain English summary of top issues
    signals: [           // individual signal breakdown
      {
        name: string,
        score: 0-100,
        detail: string,
        severity: 'high' | 'medium' | 'low'
      }
    ]
  }
}

To swap an engine, replace the model inside its file. The interface stays the same. Nothing else in the system needs to change.


Engine 1 — Image Detection

Weight: 20% of final trust score

Analyzes the profile photo and any images in posts.

Signals

Signal Model What It Catches
GAN Artifact Detection Hive AI (free tier) StyleGAN, Midjourney, DALL-E faces
C2PA Provenance Check CAI open-source SDK Missing or stripped origin credentials
Reverse Image Lookup Google Vision (or SerpAPI) Stock photos, recycled images, cross-platform reuse
EXIF Metadata Analysis ExifTool Stripped metadata (common in AI images), GPS/device anomalies

How GAN Detection Works

AI-generated faces leave mathematical artifacts in pixel patterns that are invisible to humans but detectable by trained classifiers. StyleGAN generates faces with characteristic frequency patterns in the Fourier domain. Models like UniversalFakeDetect and Hive AI exploit these patterns.

Key tells:

  • Blurry teeth/hair borders — GANs struggle with fine detail
  • Unusual ear symmetry — GANs often produce unrealistically symmetric faces
  • Background inconsistency — artifacts at face/background boundary
  • Eye reflection asymmetry — real eyes reflect the same light source; GAN eyes often don't

C2PA Integration

The Content Authenticity Initiative (contentauthenticity.org), backed by Adobe, BBC, Microsoft, Reuters, and AP, provides a cryptographic provenance standard for digital media. If an image has valid C2PA Content Credentials, ARGUS boosts its image score significantly. If credentials are present but tampered with, the score drops heavily. If absent entirely (which is the majority of images today), the engine applies a neutral penalty that will increase as C2PA adoption grows.

Swapping the Image Model

// In workers/pipeline/analyze.js → runImageEngine()
// Currently uses: Hive AI API

// To swap to DeepSafe (open source, self-hosted):
const ganResult = await checkDeepSafe(profileData.photo_url, env);

// To swap to HuggingFace UniversalFakeDetect:
const ganResult = await checkHuggingFace('Wvolf/UniversalFakeDetect', photoBuffer);

Engine 2 — Text Detection

Weight: 15% of final trust score

Analyzes bio text and recent post content as a corpus.

Signals

Signal Model What It Catches
AI Text Probability GPTZero API (free) ChatGPT, Claude, Gemini generated text
Stylometric Variance Custom NLTK analysis Near-zero variance = synthetic patterns
Content Agenda Concentration Custom TF-IDF 90%+ posts on single topic = likely astroturf
Linguistic Pattern Analysis Custom classifier Naturalness scoring

Why Stylometric Variance Matters

Real humans have inconsistent writing. They make typos, correct themselves, vary sentence length, shift tone between casual and professional posts, reference their personal life, occasionally go off-topic. Synthetic accounts — whether AI-operated or human-operated and scripted — are too consistent.

ARGUS measures the variance in sentence length, vocabulary diversity (type-token ratio), punctuation patterns, and topic distribution across the last 20+ posts. A real account with 300 posts will have a stylometric variance score of 0.5–0.8. A synthetic account typically scores below 0.2.

Agenda Concentration

ARGUS runs basic topic modeling across recent posts and calculates what percentage target a single topic or narrative. Real accounts posting about their professional industry will naturally discuss multiple aspects of it — product releases, industry news, personal opinions, off-topic interests. An astroturf account pushing a specific agenda will hit 85–95% concentration on a single narrative. Combined with stylometric and AI probability scores, this becomes a powerful coordination signal.


Engine 3 — Behavioral Analysis

Weight: 30% of final trust score

Analyzes account-level behavioral patterns. Does not require access to private data — all signals are derived from publicly visible information.

Signals

Signal Data Source What It Catches
Account Age vs. Activity Platform public data New account, massive activity = suspicious
Pre-Account Digital Footprint Google index + cross-platform search Zero history before account creation
Posting Time Pattern Post timestamps Robotic interval posting
Personal Content Ratio Post content classification No personal life content = likely operation
Career Cross-Reference LinkedIn employer data Unverifiable job history

The Digital Scar Tissue Principle

Real humans accumulate digital scar tissue over time — inconsistent posts, old forum comments on niche topics, embarrassing early posts, varied writing styles, references to current events. Fake accounts, regardless of whether AI or humans operate them, are too clean. They appear fully formed, with polished profiles and consistent messaging, and leave no pre-existing trace on the web.

This is why the pre-account digital footprint check is one of the strongest signals: an account claiming 15 years of industry experience with zero Google results, no conference mentions, no old forum posts, and no cross-platform presence before 2024 is almost certainly fabricated.

Activity Rate Thresholds

Account Age Posts/Day Connections/Day Risk Level
< 7 days > 5 > 10 CRITICAL
7–30 days > 10 > 15 HIGH
7–30 days > 5 > 8 MEDIUM
> 90 days Any Any Reduced concern

These are general guidelines. The engine weights multiple factors together, not just one.


Engine 4 — Network Graph

Weight: 35% of final trust score

The highest-weighted engine. Analyzes the account's connection network for coordinated patterns. This is the engine that catches sophisticated fakes that defeat every other engine individually.

Why Network Is Highest Weight

You can buy a real photo. You can hire a human to write a convincing bio. You can post varied content. You cannot easily manufacture a five-year-old network of diverse, organic humans who knew you before you created the account.

The network graph is the hardest signal to fake at scale and the most expensive for bad actors to maintain.

Coordination Patterns Detected

Pattern 1 — New Account Clustering

SELECT COUNT(*) FROM connections 
WHERE account_age_days < 90

If 80%+ of a profile's connections are also brand-new accounts, the profile is likely part of a freshly launched network.

Pattern 2 — Synchronized Posting Multiple accounts in the network posting about the same topic within a 60-minute window. Organic discussion spreads over hours and days. Coordinated campaigns spike.

Pattern 3 — Content Similarity Clustering Cosine similarity > 0.85 across comments from different accounts. When 20 accounts are all making slightly reworded versions of the same comment on the same post, that's coordinated astroturfing.

Pattern 4 — Flagged Network Membership If an account is connected to 5+ accounts already flagged in ARGUS's database, it inherits a coordination signal. Bad actor networks reuse connection structures.

Graph Database

Network data is stored in Cloudflare D1 (SQLite) using junction tables rather than a full graph database. This is sufficient for the coordination detection patterns above. For more sophisticated graph traversal at scale, the architecture supports migrating the graph layer to a dedicated solution while keeping the rest of the system intact.


Score Aggregation

FINAL TRUST SCORE = 
  (image_score  × 0.20) +
  (text_score   × 0.15) +
  (behavioral   × 0.30) +
  (network      × 0.35)
  ─────────────────────
  (sum of weights for engines that ran successfully)

If an engine fails (API down, no data available), its weight is redistributed proportionally to the other engines that succeeded. A score is never published with fewer than 2 engines successfully completed.


Adding a New Engine

  1. Create workers/pipeline/engines/[name]_engine.js
  2. Export the standard interface function
  3. Add to the parallel dispatch in workers/pipeline/analyze.js
  4. Add weight to the aggregateScore() weights object
  5. Add signal documentation to this wiki page
  6. Submit a PR

See Contributing for full guidelines.

Clone this wiki locally