Reference for the 9 threat classes scanned by hb-scan. Each class maps to international standards and contains specific detection rules.
Category ID: secret_exposure
Severity: High
Score Weight: 20 / 100
Detects credentials -- API keys, tokens, passwords, private keys -- that appear in AI tool conversations. This is the highest-impact category because a single exposed credential can lead to account compromise, data breach, or financial loss.
- 102 provider-specific credential patterns covering 70+ services (AWS, GCP, Azure, GitHub, Stripe, Twilio, OpenAI, Anthropic, and many more). Sourced from gitleaks, TruffleHog, detect-secrets, and GitGuardian.
- Generic API key assignments (
api_key = "AKIAIOSFODNN7EXAMPLE") - Password assignments with real values
- Bearer tokens
- JWT tokens
- SSH and PGP private keys
Rules exclude test/placeholder values (example, fake, test, changeme, your_), environment variable references (os.getenv, process.env), and variable names that look like schema definitions.
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM02 -- Sensitive Information Disclosure |
| NIST AI 600-1 | Section 2.4 -- Data Privacy, Memorisation Risks |
| ISO/IEC 42001 | A.8 -- Information and Transparency |
| ISO/IEC 27001 | A.5.14 -- Information Transfer |
| CIS Controls v8.1 | Control 3 -- Data Protection |
| MITRE ATLAS | AML.T0051.002 -- LLM Data Leakage |
aws-access-key-id: Matches AWS keys with AKIA/ASIA/ABIA/ACCA prefixgithub-pat: Matches GitHub personal access tokens (ghp_,gho_,ghu_,ghs_,ghr_prefixes)stripe-secret-key: Matches Stripe secret keys (sk_live_,sk_test_prefixes)generic-api-key: Matches genericapi_key = "value"patterns in user promptsbearer-token: MatchesBearerauthorization tokens
Category ID: unsafe_code_acceptance
Severity: High
Score Weight: 12 / 100
Detects security vulnerabilities in code generated or suggested by the AI. Research from Veracode (2025) found that 45% of AI-generated code contains OWASP Top 10 vulnerabilities. CSA reports 62% of AI-generated code has security flaws.
- eval() or exec() usage on untrusted input
- SQL queries built via string concatenation (SQL injection)
- Hardcoded credentials in generated code
- Command injection patterns
- Insecure deserialization
- Path traversal
- CORS misconfiguration
- Missing authentication checks
- Disabled TLS verification
Most rules in this category use detection: llm because distinguishing real vulnerabilities from discussions about security requires semantic understanding. These rules are skipped in Tier 1.1 (regex only) and will activate in Tier 1.2.
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM05 -- Improper Output Handling |
| OWASP Agentic Top 10 2026 | ASI05 -- Unexpected Code Execution |
| NIST SP 800-218A | PW.4 -- Secure Coding Practices |
| ISO/IEC 42001 | A.6 -- AI System Lifecycle |
| ISO/IEC 27001 | A.8.28 -- Secure Coding |
| CIS Controls v8.1 | Control 16 -- Application Software Security |
| EU AI Act | Article 15 -- Accuracy, Robustness, Cybersecurity |
eval-exec-usage: AI-generated code using eval()/exec() (LLM detection)sql-string-concatenation: SQL built via string concatenation (LLM detection)hardcoded-secret-in-code: Credentials embedded in generated source codedisabled-tls-verify: Code that setsverify=Falseor disables SSL
Category ID: dangerous_command
Severity: High
Score Weight: 10 / 100
Detects dangerous commands executed by the AI through its tool-use capabilities. AI tools like Claude Code can run shell commands, and without proper oversight, they may execute privileged or destructive operations.
- sudo usage (privileged execution)
- rm -rf on broad or system paths (/, /*, ~/, /usr, /etc, etc.)
- chmod 777 (world-writable permissions)
- Writes to system directories (/etc, /usr, /System)
- curl piped to sh/bash (remote code execution)
- Network listeners (netcat, socat binds)
- Git force-push to main/master
- Docker with --privileged flag
These rules use tool_filter: Bash to only scan Bash tool invocations, reducing false positives from code discussions.
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM06 -- Excessive Agency |
| OWASP Agentic Top 10 2026 | ASI02 -- Tool Misuse and Exploitation |
| OWASP Agentic Top 10 2026 | ASI05 -- Unexpected Code Execution |
| ISO/IEC 42001 | A.6 -- AI System Lifecycle |
| CIS Controls v8.1 | Control 4 -- Secure Configuration |
| MITRE ATLAS | Agent Tool Invocation |
sudo-usage: Any use of sudo (excludes apt-get update/install, brew, pip)rm-rf-destructive: rm -rf targeting /, ~, /usr, /etc, /var, and other system pathschmod-world-writable: chmod 777 or chmod a+rwxgit-force-push-main: git push --force to main or master
Category ID: sensitive_data_sharing
Severity: High
Score Weight: 15 / 100
Detects sensitive configuration data and secrets shared in AI conversations, beyond individual credentials. This covers entire files or connection strings that contain multiple secrets.
- .env file contents pasted into prompts (checks for 40+ known secret variable names)
- SSH private key files read by the AI tool
- Database connection strings with embedded passwords
- Cloud provider configuration files (.aws/credentials, .gcp/credentials)
- Certificate private keys
- Kubernetes secrets and configs
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM02 -- Sensitive Information Disclosure |
| NIST AI 600-1 | Section 2.4 -- Data Privacy |
| ISO/IEC 42001 | A.7 -- Data for AI Systems |
| ISO/IEC 27001 | A.5.14 -- Information Transfer |
| ISO/IEC 27001 | A.5.33 -- Protection of Records |
| CIS Controls v8.1 | Control 3 -- Data Protection |
| MITRE ATLAS | AML.T0051.002 -- LLM Data Leakage |
env-file-contents: .env file with real secret values shared in promptsssh-key-path-shared: SSH private key file read by the AI tooldatabase-connection-string: Connection URI with embedded passwordcloud-credentials-shared: AWS/GCP/Azure credential files accessed
Category ID: supply_chain_risk
Severity: Medium to High
Score Weight: 8 / 100
Detects risky package installations performed or suggested by the AI. A 2025 USENIX study found that 19.7% of AI-suggested packages are "hallucinated" -- they do not exist on official registries. Attackers can register these names and distribute malware (slopsquatting).
- pip install from git/URL sources (bypasses registry verification)
- npm install from git/GitHub URLs
- curl piped to pip (remote script execution)
- pip install --extra-index-url (potential dependency confusion)
- npm with --ignore-scripts disabled
- Packages from unverified registries
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM03 -- Supply Chain Vulnerabilities |
| NIST SP 800-218A | PW.3 -- Supply Chain Integrity |
| CIS Controls v8.1 | Control 2 -- Inventory and Control of Software Assets |
| EU AI Act | Article 15 -- Accuracy, Robustness, Cybersecurity |
pip-install-from-url: pip install from git+ or https:// sourcesnpm-install-from-git: npm install from git/GitHub URLscurl-pipe-to-pip: Remote script piped to pippip-extra-index: pip with --extra-index-url (dependency confusion risk)
Category ID: scope_violation
Severity: Medium to High
Score Weight: 5 / 100
Detects cases where the AI tool accessed files or directories outside the current project scope. AI coding tools should operate within project boundaries, but they can read files from anywhere on the filesystem.
- Home directory config files (.bashrc, .zshrc, .npmrc, .gitconfig, .docker)
- SSH configuration and key files
- AWS credentials and config (~/.aws/)
- GCP credentials (~/.config/gcloud/)
- Kubernetes config (~/.kube/config)
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM06 -- Excessive Agency |
| OWASP Agentic Top 10 2026 | ASI03 -- Identity and Privilege Abuse |
| CIS Controls v8.1 | Control 6 -- Access Control Management |
| ISO/IEC 42001 | A.5 -- Assessing Impacts |
| MITRE ATLAS | Agent Tool Invocation |
home-config-access: AI read ~/.bashrc, ~/.npmrc, or similar config filesssh-config-access: AI read ~/.ssh/config, known_hosts, or key filesaws-config-access: AI read ~/.aws/credentials or configkube-config-access: AI read ~/.kube/config
Category ID: ip_trade_secret_leakage
Severity: High
Score Weight: 15 / 100
Detects intellectual property and trade secrets shared with AI tools. Under the Defend Trade Secrets Act (18 U.S.C. 1836), sharing trade secrets with a public AI service may destroy their legal protection. A 2024 Cisco study found that 48% of organizations enter non-public company information into generative AI tools.
- Content with CONFIDENTIAL, PROPRIETARY, TRADE SECRET, or UNDER NDA markings
- Financial metrics with real values (revenue, profit, EBITDA with dollar amounts)
- Internal infrastructure URLs (staging, dev, internal subdomains)
- Customer lists and deal data
- Patent-pending technical details
| Standard | Control |
|---|---|
| ISO/IEC 42001 | A.8 -- Information and Transparency |
| ISO/IEC 27001 | A.5.14 -- Information Transfer |
| CIS Controls v8.1 | Control 3 -- Data Protection |
| MITRE ATLAS | AML.T0024 -- Exfiltration via ML Inference API |
| Defend Trade Secrets Act | 18 U.S.C. 1836 |
confidential-keyword: Content with CONFIDENTIAL/PROPRIETARY markings (LLM detection)financial-data-sharing: Revenue, profit, or EBITDA with dollar amountsinternal-url-sharing: Internal/staging/dev infrastructure URLscustomer-data-sharing: Customer lists or deal details
Category ID: regulatory_data_exposure
Severity: High
Score Weight: 15 / 100
Detects regulated personal data shared with AI tools. This includes data protected by GDPR, HIPAA, PCI-DSS, and other regulations. Italy fined OpenAI EUR 15 million for GDPR violations in 2024. Most public AI tools do not have the required data processing agreements for handling regulated data.
- GDPR personal data (names with email, phone, address, date of birth, national ID)
- HIPAA protected health information (patient data, diagnoses, prescriptions)
- PCI cardholder data (credit card numbers, CVVs)
- FERPA educational records
- CCPA consumer personal information
- SOX financial records
Most rules in this category use detection: llm because identifying real personal data (vs. test data, examples, or discussions about privacy) requires semantic understanding. These rules are skipped in Tier 1.1 and will activate in Tier 1.2.
| Standard | Control |
|---|---|
| NIST AI 600-1 | Section 2.4 -- Data Privacy |
| ISO/IEC 42001 | A.7 -- Data for AI Systems |
| ISO/IEC 27001 | A.5.33 -- Protection of Records |
| GDPR | Article 5(1)(c) -- Data Minimization |
| HIPAA | BAA Requirement |
| PCI-DSS | Requirement 3 -- Protect Stored Cardholder Data |
gdpr-personal-data: Real personal data of real individuals (LLM detection)hipaa-health-data: Protected health information with patient identifiers (LLM detection)pci-card-number: Credit card numbers matching Luhn algorithmssn-pattern: Social Security Number patterns
Category ID: excessive_reliance
Severity: Info
Score Weight: 0 / 100 (informational only, does not reduce score)
Detects sessions where the AI operated with minimal human oversight. This is an informational category designed to raise awareness about automation bias. Research shows that 88% of AI-generated code suggestions are accepted without modification (GitHub/Accenture 2024). Courts have held organizations liable for AI-generated outputs (Air Canada chatbot case, Mata v. Avianca).
- Auto-pilot sessions: 50+ tool executions with fewer than 3 substantive user messages
- This indicates the AI was executing commands in a largely autonomous fashion
This category uses a session_heuristic match type instead of regex. It evaluates session-level statistics rather than text patterns.
| Standard | Control |
|---|---|
| OWASP LLM Top 10 2025 | LLM09 -- Misinformation (Overreliance) |
| EU AI Act | Article 14 -- Human Oversight |
| NIST AI RMF | Measure 2.8 -- Human Oversight Statistics |
| ISO/IEC 42001 | A.5 -- Assessing Impacts |
| Mata v. Avianca | Lawyers sanctioned for citing AI-hallucinated cases |
| Air Canada chatbot | Company held liable for AI chatbot's fabricated policy |
auto-pilot-session: Session with 50+ tool executions and fewer than 3 user messages
| ID | Category | Weight | Severity | Detection | Rule Count |
|---|---|---|---|---|---|
| DH-01 | Secret Exposure | 20 | High | Regex | 105 |
| DH-02 | Unsafe Code | 12 | High | Mostly LLM | 9 |
| DH-03 | Dangerous Commands | 10 | High | Regex | 8 |
| DH-04 | Sensitive Data | 15 | High | Regex + LLM | 7 |
| DH-05 | Supply Chain | 8 | Medium-High | Regex | 6 |
| DH-06 | Scope Violation | 5 | Medium-High | Regex | 5 |
| DH-07 | IP / Trade Secrets | 15 | High | Regex + LLM | 5 |
| DH-08 | Regulatory Data | 15 | High | Mostly LLM | 6 |
| DH-09 | Oversight | 0 | Info | Heuristic | 1 |
| Total | 100 | 152 |