Cognitive risk analysis platform for detecting social engineering attacks in corporate communications.
CogniX Surface analyzes text communications for cognitive attack patterns (phishing, BEC, pretexting, CEO fraud). The system combines:
- Semantic NLP — sentence-transformers embeddings (
all-MiniLM-L6-v2) - Feature engineering — 11 psychological dimensions (urgency, authority, trust, fear, social proof, reciprocity, commitment, liking) + linguistic signals (sentiment, text length)
- Explainable scoring — configurable weights, exponential transformation, per-feature contribution breakdown
- Interactive web dashboard — FastAPI + Bootstrap + Vega-Lite
- Operational triage — prioritized queue with SLA, SQLite persistence, API Key authentication
The goal is to accelerate analyst workflows (prioritization, investigation, follow-up), not to replace human decision-making.
| Metric | Value |
|---|---|
| Backend API | app/dashboard.py — 20 REST endpoints |
| Frontend | app/templates/dashboard.html — Bootstrap 5 + Vanilla JS + Vega-Embed |
| Persistence | SQLite WAL (app/database.py) — 4 tables |
| Automated tests | 153 (13 test files) |
| Demo dataset | datacommunications.txt.txt — 115 messages, 20 users |
| Containerization | Docker + docker-compose |
| Authentication | API Key (X-API-Key) on 8 sensitive endpoints |
| Rate limiting | slowapi (120/min global, 3/min on pipeline) |
| Step | Module | Description |
|---|---|---|
| 1 | ingestion/loader.py |
CSV loading (; separator), encoding fallback, deduplication |
| 2 | analysis/nlp_engine.py |
Semantic embeddings (sentence-transformers, batch 64) |
| 3 | analysis/analyzer.py |
VADER sentiment analysis + keyword counting across 7 categories |
| 4 | analysis/feature_engineering.py |
Regex pattern matching + min-max normalization + 1-exp(-x) transform |
| 5 | model/risk_engine.py |
Weighted scoring with absolute normalization, per-feature contributions, dominant driver |
| 6 | app/dashboard.py |
REST API + interactive dashboard visualization |
- Core KPIs: volume, risk distribution, average/max risk, high-risk percentage
- 9 visualizations: histogram, donut, driver bar, user scatter, contribution heatmap, correlation matrix, boxplot, weights, feature averages
- Advanced filters: risk range, bands (Low/Medium/High), driver, users, text query, top-N
- Tables: sortable, searchable, paginated (offset/limit with
has_moremetadata) - Explainability: detail cards with per-feature contribution breakdown for high-risk messages
- Filter presets: save/load filter configurations (max 50, persisted in SQLite)
- Real-time pipeline progress via SSE (
/api/run/stream) withprogress,done,fatalevents - Configurable auto-refresh
- KPI Timeline: snapshot history after each run (max 200, persisted in SQLite)
- Run history with KPI delta comparison between runs
- Run history persistence via IndexedDB + localStorage fallback
- Audit log: action recording (status changes, weight updates, webhooks) in SQLite
- Native browser notifications
- Multi-rule triggers: high-risk percentage, average risk, high-risk count
- Anti-spam cooldown + daily max-per-rule cap
- Backend webhook relay with:
- HMAC-SHA256 signing (
X-CogniX-Signature) - Exponential retry (3 attempts, backoff up to 4s)
- No retry on 4xx errors
- HMAC-SHA256 signing (
- Dedicated alert history in UI
- Case queue with workflow:
new→in_progress→mitigated/false_positive - Automatic priority:
P1(High),P2(Medium),P3(Low) based on risk score - Automatic SLA deadlines for open items with overdue flag; queue ordering prioritizes overdue items
- Assignee and operational notes (XSS-sanitized)
- Bulk update: mass update up to 250 items per request
- Automatic sync from results + manual bootstrap
- Full persistence in SQLite (triage, presets, timeline, audit)
- Light/dark mode
- Custom color themes and backgrounds
- Custom sidebar appearance
- Preferences persisted in browser storage
- API Key authentication (
X-API-Key) with constant-time comparison (hmac.compare_digest) - Input sanitization:
html.escape()on notes, assignee, preset names;max_lengthon all Pydantic string fields - Rate limiting: 120 req/min global, 3 req/min on pipeline endpoints
- Configurable CORS via env (
COGNIX_CORS_ORIGINS) - Environment variables via
.env(python-dotenv) - Dev mode: authentication disabled when
COGNIX_API_KEYis empty
Cognitive_Attack_Mapper/
├── .dockerignore
├── .env # Environment variables (do not commit)
├── .env.example # Environment template
├── docker-compose.yml
├── Dockerfile
├── README.md # Italian version
├── README.en.md # This file
├── requirements.txt # 14 Python dependencies
├── datacommunications.txt.txt # Demo dataset (115 messages, 20 users)
│
├── analysis/
│ ├── __init__.py
│ ├── analyzer.py # VADER sentiment + keyword counting
│ ├── constants.py # Keywords, regex, risk band thresholds
│ ├── feature_engineering.py # Regex matching + normalization + transform
│ └── nlp_engine.py # Sentence-transformers embeddings
│
├── app/
│ ├── __init__.py
│ ├── dashboard.py # Main FastAPI app (20 endpoints)
│ ├── database.py # SQLite persistence layer (4 tables, WAL mode)
│ ├── main.py # CLI entry point
│ ├── static/
│ │ └── favicon.svg
│ └── templates/
│ └── dashboard.html # Jinja2 HTML frontend (Bootstrap + Vega-Embed)
│
├── config/
│ ├── __init__.py
│ └── risk_weights.json # 11 risk weights (sum = 1.00)
│
├── data/ # SQLite DB created at runtime
├── docs/
│ ├── executive-one-pager.md
│ ├── executive-one-pager.en.md
│ ├── technical-stakeholder-security.md
│ └── technical-stakeholder-security.en.md
│
├── ingestion/
│ ├── __init__.py
│ └── loader.py # CSV loader with encoding fallback
│
├── logs/ # Loguru logs at runtime
├── model/
│ ├── __init__.py
│ └── risk_engine.py # Weighted scoring + explanations
│
├── tests/ # 13 files, 153 tests
│ ├── __init__.py
│ ├── test_analyzer.py # 8 tests — keyword counting, feature extraction
│ ├── test_api_contracts.py # 11 tests — API contracts, 400/404 validation
│ ├── test_edge_cases.py # 22 tests — unicode, emoji, long text, regex
│ ├── test_feature_engineering.py # 1 test — feature column creation
│ ├── test_hardening.py # 32 tests — DB, auth, sanitization, Docker
│ ├── test_integration.py # 2 tests — end-to-end pipeline + determinism
│ ├── test_loader.py # 1 test — data cleanup
│ ├── test_negative.py # 8 tests — missing files, encoding, invalid weights
│ ├── test_nlp_engine.py # 7 tests — shape, normalization, batch, model
│ ├── test_p1_features.py # 58 tests — pagination, webhook retry, presets, KPI
│ ├── test_quickwins_v2.py # 16 tests — Swagger, CORS, bulk triage, rate limit
│ ├── test_risk_engine.py # 2 tests — score output + weight loading
│ └── test_triage_alert_weights.py # 57 tests — triage ops, weights, webhook HMAC, LRU cache
│
├── utils/
│ ├── __init__.py
│ └── logger.py # Loguru configuration
│
└── visualization/
├── __init__.py
└── report.py # Rich table CLI report
- Python 3.10+
- Virtual environment recommended (
.venv) - 14 dependencies in
requirements.txt:
| Package | Version | Purpose |
|---|---|---|
| pandas | >=1.3,<3.0 | Data manipulation |
| numpy | >=1.21,<2.0 | Numeric computation |
| scikit-learn | >=1.0,<2.0 | Normalization |
| networkx | >=2.6 | Graph analysis (future) |
| sentence-transformers | >=2.2.0,<3.0 | NLP embeddings |
| rich | >=13.0 | CLI reporting |
| loguru | >=0.6.0 | Structured logging |
| nltk | >=3.8 | Tokenization, VADER sentiment |
| altair | >=4.2,<6.0 | Vega-Lite chart specs |
| fastapi | >=0.100,<1.0 | API framework |
| uvicorn[standard] | >=0.20,<1.0 | ASGI server |
| jinja2 | >=3.1,<4.0 | HTML templating |
| slowapi | >=0.1.9 | Rate limiting |
| python-dotenv | >=1.0,<2.0 | .env variable loading |
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/macOS
pip install -r requirements.txtCopy .env.example to .env and customize:
# API Key to protect sensitive endpoints (run, weights, triage, webhook).
# If empty, authentication is disabled (development mode).
COGNIX_API_KEY=
# HMAC-SHA256 secret for signing outgoing webhook payloads.
COGNIX_WEBHOOK_SECRET=
# SQLite database path for persistence.
# Default: ./data/cognix.db
COGNIX_DB_PATH=./data/cognix.db
# Allowed CORS origins (comma-separated). Use * for development only.
COGNIX_CORS_ORIGINS=*
# Log level: DEBUG, INFO, WARNING, ERROR
LOG_LEVEL=INFO
# Global rate limit (requests/minute)
COGNIX_RATE_LIMIT=120/minute& .\.venv\Scripts\python.exe -m uvicorn app.dashboard:app --host 127.0.0.1 --port 8000 --reloadOpen: http://127.0.0.1:8000
docker compose up --buildService available at http://localhost:8000. Data persists in the cognix-data volume.
Recommended setup:
.vscode/launch.jsonwithDashboard (Uvicorn).vscode/tasks.jsonwithRun Dashboard (Uvicorn)
- Configure dataset and weights in the sidebar
- Start Run Analysis
- Follow SSE progress bar
- Apply filters and analyze main tabs (Results, Users, Charts)
- Review Alerts and Audit sections
- Manage cases in the Triage tab (status, owner, notes, bulk update)
- Export CSV for reporting
Default paths:
- Dataset:
datacommunications.txt.txt - Weights:
config/risk_weights.json
| Method | Path | Description | Protected |
|---|---|---|---|
GET |
/ |
Dashboard HTML page | No |
GET |
/api/run/stream |
Pipeline SSE (progress → done/fatal) | No (3/min) |
POST |
/api/run |
Synchronous pipeline execution | Yes |
POST |
/api/results |
KPIs + chart specs + filtered/paginated tables | No |
GET |
/api/user/{username} |
User drill-down details | No |
GET |
/api/health |
Health check | No |
| Method | Path | Description | Protected |
|---|---|---|---|
GET |
/api/export/csv |
Full CSV export | No |
GET |
/api/export/users |
Per-user summary export | No |
POST |
/api/export/filtered |
Filtered CSV export | No |
| Method | Path | Description | Protected |
|---|---|---|---|
GET |
/api/weights |
Current weights | No |
POST |
/api/weights |
Update weights + optional rescore | Yes |
| Method | Path | Description | Protected |
|---|---|---|---|
POST |
/api/triage/list |
Triage queue with status/priority/owner/text filters | No |
PATCH |
/api/triage/item/{id} |
Update single item status/assignee/note | Yes |
POST |
/api/triage/bulk-update |
Mass update (max 250 items) | Yes |
POST |
/api/triage/bootstrap |
Sync triage queue from analysis results | Yes |
| Method | Path | Description | Protected |
|---|---|---|---|
POST |
/api/alerts/webhook |
Relay alert to external webhook (with HMAC) | Yes |
| Method | Path | Description | Protected |
|---|---|---|---|
GET |
/api/filter-presets |
List saved presets | No |
POST |
/api/filter-presets |
Create/save preset (max 50) | Yes |
DELETE |
/api/filter-presets/{name} |
Delete preset | Yes |
| Method | Path | Description | Protected |
|---|---|---|---|
GET |
/api/kpi-timeline |
KPI snapshot history (max 200) | No |
Expected format (; separator, no header row):
user;message text
Example from the demo dataset (20 realistic Italian users, bilingual IT/EN messages):
marco.rossi;Buongiorno a tutti, la riunione e' stata spostata alle 15:00 in sala Galileo
giulia.bianchi;Ricordo che le richieste ferie per agosto vanno inserite sul portale HR entro venerdi
alessia.conti;URGENT: Your corporate account has been suspended due to a security breach...
The dataset contains 115 messages with a realistic distribution: ~87% Low, ~8% Medium, ~5% High risk.
| Feature | Weight | Type | Description |
|---|---|---|---|
urgency_score |
0.18 | Keyword/Regex | Urgency: urgent, immediately, asap, now, quick |
authority_score |
0.15 | Keyword/Regex | Authority: manager, director, admin, IT |
semantic_signal |
0.12 | Embedding | Semantic similarity to attack templates |
social_proof_score |
0.10 | Keyword/Regex | Social pressure: everyone, everybody, tutti, already approved |
text_length_signal |
0.08 | Length | Text length signal (normalized) |
sentiment_risk_signal |
0.08 | VADER | Negative/manipulative sentiment |
reciprocity_score |
0.08 | Keyword/Regex | Reciprocity: favor, return the favor, ricambia, per favore |
commitment_score |
0.07 | Keyword/Regex | Commitment: as promised, as agreed, come concordato, come promesso |
trust_score |
0.06 | Keyword/Regex | Trust: trust, confidential, secure, official, verified |
liking_score |
0.04 | Keyword/Regex | Liking: dear friend, caro amico, ti stimo |
fear_score |
0.04 | Keyword/Regex | Fear: account suspended, legal action, penalty, sospeso, bloccato |
Weights sum to exactly 1.00. They are customizable via API (POST /api/weights) or by editing config/risk_weights.json.
For count-based features (keyword/regex matches):
1 match → 0.632, 2 matches → 0.865, 3 matches → 0.950
For features already in
Final normalized score:
Risk bands:
| Band | Range | Color |
|---|---|---|
| Low | [0.0, 0.4) | Blue |
| Medium | [0.4, 0.7) | Orange |
| High | [0.7, 1.0] | Red |
Automatic persistence with 4 tables (WAL mode, thread-safe):
| Table | Content |
|---|---|
triage_items |
Triage queue (id, JSON data, updated_at) |
filter_presets |
Saved filter presets (name, JSON data, created_at) |
kpi_timeline |
KPI snapshots after each run (JSON data, created_at) |
audit_log |
Action audit trail (action, details, created_at) |
The database is created automatically on first startup at COGNIX_DB_PATH (default: ./data/cognix.db).
Run all 153 tests:
.venv\Scripts\python.exe -m pytest tests/ -v --tb=shortCoverage by area:
| Area | Tests | File |
|---|---|---|
| Text analysis | 8 | test_analyzer.py |
| API contracts | 11 | test_api_contracts.py |
| Edge cases | 22 | test_edge_cases.py |
| Feature engineering | 1 | test_feature_engineering.py |
| Hardening (DB, auth, XSS, Docker) | 32 | test_hardening.py |
| Pipeline integration | 2 | test_integration.py |
| Data loader | 1 | test_loader.py |
| Negative cases | 8 | test_negative.py |
| NLP engine | 7 | test_nlp_engine.py |
| P1 features (pagination, webhook, presets, KPI) | 58 | test_p1_features.py |
| Quick-wins v2 (Swagger, CORS, bulk, rate limit) | 16 | test_quickwins_v2.py |
| Risk engine | 2 | test_risk_engine.py |
| Triage, alerts, weights, LRU cache | 57 | test_triage_alert_weights.py |
docker compose up --build- Base image:
python:3.12-slim - Exposed port: 8000
- Data volume:
cognix-data→/app/data(SQLite DB) - Health check: every 30s on
/api/system/status - Env file:
.envloaded automatically - Restart policy:
unless-stopped
An API endpoint returned plain-text 500 instead of JSON. Check backend logs and ensure a run (POST /api/run or /api/run/stream) is executed before requesting results.
If Uvicorn runs without --reload, restart the server.
Ensure URL starts with http:// or https://, target is reachable, and target accepts JSON POST. The system automatically retries 3 times on 5xx errors.
Run an analysis first, then trigger POST /api/triage/bootstrap or use the "Sync from results" button in the Triage tab.
If COGNIX_API_KEY is set in .env, all protected endpoints require the X-API-Key header. To disable authentication, leave the variable empty.
The global limit is 120 req/min. Pipeline endpoints (/api/run, /api/run/stream) have a 3 req/min limit. Wait for the reset indicated in the X-RateLimit-Reset header.
- Linear weighted scoring (not supervised/adaptive model)
- No native out-of-the-box SIEM connector
- SSE endpoint (
/api/run/stream) not protected by API Key - No RBAC (role-based access control)
- SSRF protection on webhook endpoints
- Security headers (CSP, HSTS, X-Frame-Options)
- Authentication on SSE endpoint
- Dedicated SIEM/SOAR connectors
- Advanced RBAC/governance
- GitHub Actions CI for automated testing
- EN:
docs/technical-stakeholder-security.en.md - EN:
docs/executive-one-pager.en.md - IT:
README.md - IT:
docs/technical-stakeholder-security.md - IT:
docs/executive-one-pager.md