You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Gate: 0 (must pass before Phase 1 compute)
Date: 2026-03-20
Target venue: USENIX Security 2026 [HYPOTHESIZED]
**lock_commit: eb454e0Profile: contract-track
Budget: ~$3-5 Claude API (Haiku)
Novelty Claim
First empirical measurement of regression rate when LLMs generate security patches, showing what fraction introduce new vulnerabilities detectable by static analysis.
Comparison Baselines
#
Method
Citation
How We Compare
Why This Baseline
1
No-patch baseline
Control
Vulnerability count in original code
Lower bound — any patch must not increase vuln count
2
Template-based fix
Rule-based
CWE-specific fix templates (e.g., parameterized queries for SQLi)
Shows whether LLM adds value beyond known patterns
3
Human patches (NVD reference)
NVD database
Compare regression rates: LLM vs human patches
Gold standard for patch quality
Pre-Registered Reviewer Kill Shots
#
Criticism
Planned Mitigation
1
"Static analysis has high false positive rates"
Report both raw and verified findings. Use semgrep with high-confidence rules only. Manual verification on 10% sample.
2
"Synthetic code snippets don't represent real codebases"
Use CWE top 25 patterns from real CVE reports. Snippets are minimal reproducible examples, not toy code. Limitation acknowledged.
3
"Prompt engineering determines results, not model capability"
Ablation E4 tests 3 prompt detail levels. Report all prompts for reproducibility.
Use semgrep high-confidence rules only. Manual verification on 10% sample. Report verified and unverified rates separately.
Prompt engineering confound — results depend on prompt, not model
Construct validity
Ablation E4 tests 3 prompt levels. Report all prompts.
CWE sampling bias — top 25 may not represent full vulnerability landscape
External validity
Top 25 covers ~75% of real-world vulns by frequency. Acknowledged limitation.
LLM may have seen specific CVE fixes in training data
Internal validity
Use recently published CVEs where possible. The measurement is still valid: we're testing what practitioners would actually get from the tool.
Depth Escalation (R34)
Depth Commitment
ONE primary finding: LLM patch safety is entirely CWE-dependent — 100% fix rate for crypto, 50% regression for SQL injection.
Mechanism Analysis Plan
Finding
Proposed Mechanism
Experiment
Crypto fixes work (100%)
Pattern replacement: md5→sha256 is context-independent
E3 CWE analysis
SQL fixes regress (50%)
Context-dependent reasoning failure: model rewrites SQL but introduces new concatenation
E3 CWE analysis
Adaptive Adversary Plan
Robustness Claim
Weak Test
Adaptive Test
LLM patches fix vulnerabilities
Standard CWE snippets
Adversarial snippets with misleading comments that encourage wrong fix patterns
Regression detection catches issues
Standard static analysis patterns
Obfuscated vulnerability patterns that evade regex detection
Note: Adaptive adversary testing is acknowledged as future work. Current study establishes baseline fix/regression rates on standard snippets.
Published Baseline Reproduction
Compare against Pearce et al. (2023) fix rates where possible.
Parameter Sensitivity Plan
Parameter
Range
Expected Effect
Prompt detail level
minimal/CWE/guided
More guidance = lower regression (E4)
CWE category
5 categories
Fix rate varies by CWE (E3)
Defense Harm Test
N/A — measuring patch quality, not deploying a defense.
Formal Contribution Statement
We contribute CWE-stratified patch regression rates showing AI patching is safe for crypto but dangerous for SQL injection.
Audience Alignment
Audience: Security practitioners using AI coding assistants (Copilot, Claude) + AI builders evaluating code generation safety
Portfolio position: "Security FROM AI" — LLM output as potential attack surface. Complements FP-18 (watermark detection). First code-generation project.
Distribution plan: Blog on rexcoleman.dev → LinkedIn → Reddit r/netsec + r/programming → DEF CON AI Village. "X% of AI patches introduce new vulns" is a shareable headline.
Experiment Matrix
ID
Question
IV
Levels
DV
Seeds
E0
Sanity: LLM generates syntactically valid patches
N/A
3 known CWEs
Valid patch output
1
E1
Fix rate: does the patch resolve the target vulnerability?
CWE category
5 categories
Fix rate (%)
5
E2
Regression rate: does the patch introduce new vulnerabilities?
CWE category
5 categories
Regression rate (%)
5
E3
CWE category analysis: which types have highest regression?
CWE category
injection, XSS, memory, logic, crypto
Regression rate per category
5
E4
Prompt detail ablation: does guidance reduce regression?