A SIEM Detection Engineering rule pack to detect proprietary data exfiltration to unauthorized SaaS LLMs via outbound payload volume analysis.
Most enterprises attempt to manage "Shadow AI" by blocking domains like chatgpt.com at the DNS level. This fails for three critical reasons:
- Bypass: Developers easily bypass local DNS sinkholes using DNS-over-HTTPS (DoH) or personal VPNs.
- API Wrappers: Data is often exfiltrated via backend APIs or obscure Vercel/HuggingFace wrappers, not the main front end domains.
- The Authorized Use Trap: If your organization allows AI tools for general productivity, DNS logs cannot differentiate between a harmless query (500 bytes) and a developer pasting 10,000 lines of proprietary source code (50,000+ bytes).
You cannot exfiltrate 5 Megabytes of source code in a 50 byte packet.
To detect true Data Loss Prevention (DLP) events related to GenAI, Security Operations Centers (SOCs) must pivot from DNS logs to Proxy/Next Gen Firewall (NGFW) logs.
This repository contains SIEM queries designed to aggregate bytes_out (outbound traffic volume) to known AI infrastructure. It isolates users and endpoints uploading anomalously large payloads, separating legitimate chat queries from massive data uploads.
/threat_intel: A continually updated CSV of known AI SaaS and API domains (ai_domains_list.csv)./splunk: SPL queries for Splunk Enterprise/Cloud./sentinel: KQL queries for Microsoft Sentinel./elastic: Lucene/ES|QL queries for Elastic Security.
- Download the
ai_domains_list.csvfrom the/threat_inteldirectory and import it as a lookup table in your SIEM. - Navigate to your respective SIEM folder (e.g.,
/sentinel). - Copy the detection logic and adjust the
ByteThreshold(default is set to ~2MB) to match your enterprise baseline. - Deploy as a scheduled hunt or an active alert rule.
These queries analyze network metadata and data transfer volumes. Before deploying these rules in a production environment, ensure your organization has the legal authority to monitor employee outbound traffic volumes. You must adhere to local privacy, telecommunications, and labor laws (e.g., GDPR in the EU, PIPL in China). This tool is for authorized threat hunting and DLP monitoring only.
We build governance frameworks for the Agentic AI era. Finding Shadow AI is step one. Governing it without killing developer velocity is step two.
Need help building an Enterprise AI Gateway or an enforceable Acceptable Use Policy?
Contact us for an AgentClaw Controls Toolkit (ACT) assessment at contact@move78int.com.
