<<<<<<< HEAD
| Paper | Venus | Code | |
|---|---|---|---|
| The Digital Cybersecurity Expert: How Far Have We Come? | Venus | - | |
| Safety in Large Reasoning Models: A Survey | Venus | - | |
| Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | Venus | - | |
| From System 1 to System 2: A Survey of Reasoning Large Language Models | Venus | - | |
| Safety at Scale: A Comprehensive Survey of Large Model Safety | Venus | - | |
| Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | Venus | - | |
| Reasoning Models Don't Always Say What They Think | - | - | |
| The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 | Venus | - | |
| SAFECHAIN: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities | Venus | - | |
| DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning | Venus | - | |
| o3-mini vs DeepSeek-R1: Which One is Safer? | Venus | - | |
| Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings | Venus | - | |
| Safety Evaluation of DeepSeek Models in Chinese Contexts | Venus | - | |
| SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models | Venus | - | |
| Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning | Venus | - | |
| ======= |
- The Digital Cybersecurity Expert: How Far Have We Come? https://arxiv.org/pdf/2504.11783
- Safety in Large Reasoning Models: A Survey https://arxiv.org/pdf/2504.17704
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models https://arxiv.org/pdf/2503.16419
- From System 1 to System 2: A Survey of Reasoning Large Language Models https://arxiv.org/pdf/2502.17419
- Safety at Scale: A Comprehensive Survey of Large Model Safety https://arxiv.org/pdf/2502.05206
- Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies https://arxiv.org/pdf/2501.17030v1
- Reasoning Models Don’t Always Say What They Think https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
- The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 https://arxiv.org/pdf/2502.12659
- SAFECHAIN: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities https://arxiv.org/pdf/2502.12025
- DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning https://arxiv.org/pdf/2504.07128
- o3-mini vs DeepSeek-R1: Which One is Safer? https://arxiv.org/pdf/2501.18438
- Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings https://arxiv.org/pdf/2503.15092
- Safety Evaluation of DeepSeek Models in Chinese Contexts https://arxiv.org/pdf/2502.11137
- SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models https://arxiv.org/pdf/2504.08813
- Are Smarter LLMs Safer? Exploring Safety-Reasoning Trade-offs in Prompting and Fine-Tuning https://arxiv.org/pdf/2502.09673
0f89124a4ab688177c1591bce25e9dc0816480ef
| Paper | Venus | Code | |
|---|---|---|---|
| OverThink: Slowdown Attacks on Reasoning LLMs | Venus | - | |
| Trading Inference-Time Compute for Adversarial Robustness | Venus | - | |
| A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos | Venus | - | |
| H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models | Venus | - | |
| Process or Result? Manipulated Ending Tokens Can Mislead Reasoning LLMs to Ignore the Correct Reasoning Steps | Venus | - | |
| ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs | Venus | - | |
| BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack | Venus | - | |
| DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs | Venus | - | |
| SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models | Venus | - | |
| Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models | Venus | - | |
| Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Venus | - | |
| LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet | Venus | - |
<<<<<<< HEAD
| Paper | Venus | Code | |
|---|---|---|---|
| STAR-1: Safer Alignment of Reasoning LLMs with 1K Data | Venus | - | |
| RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability | Venus | - | |
| Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment | Venus | - | |
| Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning | Venus | - | |
| ======= |
- STAR-1: Safer Alignment of Reasoning LLMs with 1K Data https://arxiv.org/pdf/2504.01903
- RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability https://arxiv.org/pdf/2504.10081
- Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment https://arxiv.org/pdf/2502.12025
- Enhancing Model Defense Against Jailbreaks with Proactive Safety Reasoning https://arxiv.org/pdf/2501.19180
- [SP2025] SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis https://www.computer.org/csdl/proceedings-article/sp/2025/223600c791/26hiV8eg35u
- Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask https://arxiv.org/pdf/2504.13474
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities https://arxiv.org/abs/2503.17332
- How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
- [Neurips 2024] RedCode: Risky Code Execution and Generation Benchmark for Code Agents https://proceedings.neurips.cc/paper_files/paper/2024/file/bfd082c452dffb450d5a5202b0419205-Paper-Datasets_and_Benchmarks_Track.pdf
- AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents https://arxiv.org/abs/2502.05957
- [CCS 2023] Large Language Models for Code: Security Hardening and Adversarial Testing https://arxiv.org/pdf/2302.05319
- [CCS 2024] PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) https://dl.acm.org/doi/10.1145/3658644.3690298
- Meta CodeShield https://github.com/meta-llama/PurpleLlama/tree/main/CodeShield
0f89124a4ab688177c1591bce25e9dc0816480ef
<<<<<<< HEAD
| Paper | Venus | Code | |
|---|---|---|---|
| AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents | Venus | - | |
| A Survey on Trustworthy LLM Agents: Threats and Countermeasures | Venus | - | |
| R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | Venus | - | |
| AI Agents Under Threat: A Survey of Key Security Challenges | Venus | - | |
| Emerging Cyber Attack Risks of Medical AI Agents | Venus | - | |
| Agentdojo: A dynamic environment to evaluate attacks and defenses for llm agents | Venus | - | |
| Formalizing and benchmarking attacks and defenses in llm-based agents | Venus | - | |
| Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | Venus | - | |
| RedCode: Risky Code Execution and Generation Benchmark for Code Agents | - | - | |
| CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities | Venus | - | |
| Security of AI Agents | Venus | - | |
| Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents | Venus | - |
| Paper | Venus | Code | |
|---|---|---|---|
| UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning | Venus | - | |
| Towards Action Hijacking of Large Language Model-based Agent | Venus | - |
| Paper | Venus | Code | |
|---|---|---|---|
| SHIELDAGENT: Shielding Agents via Verifiable Safety Policy Reasoning | Venus | - | |
| Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents | Venus | - | |
| PentestAgent: Incorporating LLM Agents to Automated Penetration Testing | Venus | - | |
| ======= |
- AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents https://arxiv.org/pdf/2412.14470
- A Survey on Trustworthy LLM Agents: Threats and Countermeasures https://arxiv.org/pdf/2503.09648
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents https://arxiv.org/pdf/2401.10019
- AI Agents Under Threat: A Survey of Key Security Challenges https://arxiv.org/pdf/2406.02630
- Emerging Cyber Attack Risks of Medical AI Agents https://arxiv.org/pdf/2504.03759
- Agentdojo: A dynamic environment to evaluate attacks and defenses for llm agents https://arxiv.org/pdf/2406.13352
- Formalizing and benchmarking attacks and defenses in llm-based agents https://arxiv.org/abs/2410.02644
- Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents https://arxiv.org/abs/2502.11355
- RedCode: Risky Code Execution and Generation Benchmark for Code Agents https://proceedings.neurips.cc/paper_files/paper/2024/file/bfd082c452dffb450d5a5202b0419205-Paper-Datasets_and_Benchmarks_Track.pdf
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities https://arxiv.org/abs/2503.17332
- Security of AI Agents https://arxiv.org/pdf/2406.08689
- Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents https://arxiv.org/pdf/2411.09523
- Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions https://arxiv.org/pdf/2503.23278
- AGENT-SAFETYBENCH: Evaluating the Safety of LLM Agents https://arxiv.org/pdf/2412.14470
- UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning https://arxiv.org/abs/2503.01908
- Towards Action Hijacking of Large Language Model-based Agent https://arxiv.org/pdf/2412.10807
- SHIELDAGENT: Shielding Agents via Verifiable Safety Policy Reasoning https://arxiv.org/pdf/2503.22738
- Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents https://arxiv.org/pdf/2412.18371
- [AsianCCS 2025] PentestAgent: Incorporating LLM Agents to Automated Penetration Testing https://arxiv.org/pdf/2411.05185
- LlamaFirewall: An open source guardrail system for building secure AI agents https://ai.meta.com/research/publications/llamafirewall-an-open-source-guardrail-system-for-building-secure-ai-agents/
- Introducing the Model Context Protocol https://www.anthropic.com/news/model-context-protocol
- Model Context Protocol https://github.com/modelcontextprotocol
0f89124a4ab688177c1591bce25e9dc0816480ef