Skip to content

Reading list for my research interest about Generative AI Security

Notifications You must be signed in to change notification settings

UBSec/AI-Security-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Generative AI Security Papers:

Large Language/Reasoning Models Safety

Survey | Measurements | Benchmarks

<<<<<<< HEAD

Paper Venus PDF Code
The Digital Cybersecurity Expert: How Far Have We Come? Venus PDF -
Safety in Large Reasoning Models: A Survey Venus PDF -
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Venus PDF -
From System 1 to System 2: A Survey of Reasoning Large Language Models Venus PDF -
Safety at Scale: A Comprehensive Survey of Large Model Safety Venus PDF -
Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies Venus PDF -
Reasoning Models Don't Always Say What They Think - PDF -
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Venus PDF -
SAFECHAIN: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Venus PDF -
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Venus PDF -
o3-mini vs DeepSeek-R1: Which One is Safer? Venus PDF -
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings Venus PDF -
Safety Evaluation of DeepSeek Models in Chinese Contexts Venus PDF -
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models Venus PDF -
Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning Venus PDF -
=======

0f89124a4ab688177c1591bce25e9dc0816480ef

Attacks

Paper Venus PDF Code
OverThink: Slowdown Attacks on Reasoning LLMs Venus PDF -
Trading Inference-Time Compute for Adversarial Robustness Venus PDF -
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos Venus PDF -
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models Venus PDF -
Process or Result? Manipulated Ending Tokens Can Mislead Reasoning LLMs to Ignore the Correct Reasoning Steps Venus PDF -
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs Venus PDF -
BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack Venus PDF -
DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs Venus PDF -
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models Venus PDF -
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models Venus PDF -
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues Venus PDF -
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Venus PDF -

Defenses

<<<<<<< HEAD

Paper Venus PDF Code
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data Venus PDF -
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability Venus PDF -
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment Venus PDF -
Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning Venus PDF -
=======

Code Generation Security

Survey | Measurements | Benchmarks

Attack

Defense

Media | Reports | Tools

0f89124a4ab688177c1591bce25e9dc0816480ef

AI Agent Security

Survey | Measurements | Benchmarks

<<<<<<< HEAD

Paper Venus PDF Code
AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents Venus PDF -
A Survey on Trustworthy LLM Agents: Threats and Countermeasures Venus PDF -
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents Venus PDF -
AI Agents Under Threat: A Survey of Key Security Challenges Venus PDF -
Emerging Cyber Attack Risks of Medical AI Agents Venus PDF -
Agentdojo: A dynamic environment to evaluate attacks and defenses for llm agents Venus PDF -
Formalizing and benchmarking attacks and defenses in llm-based agents Venus PDF -
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents Venus PDF -
RedCode: Risky Code Execution and Generation Benchmark for Code Agents - PDF -
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities Venus PDF -
Security of AI Agents Venus PDF -
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents Venus PDF -

Attacks

Paper Venus PDF Code
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning Venus PDF -
Towards Action Hijacking of Large Language Model-based Agent Venus PDF -

Defenses

Paper Venus PDF Code
SHIELDAGENT: Shielding Agents via Verifiable Safety Policy Reasoning Venus PDF -
Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents Venus PDF -
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing Venus PDF -
=======

Attacks

Defenses

Media | Reports | Tools

0f89124a4ab688177c1591bce25e9dc0816480ef

About

Reading list for my research interest about Generative AI Security

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published