adversarial-testing

Star

Here are 11 public repositories matching this topic...

stchakwdev / Gaslight_EVAL

Star

AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation

python ai-safety openrouter llm-evaluation adversarial-testing alignment-research epistemic-robustness

Updated Dec 18, 2025
Python

vibheksoni / jailbench

Star

Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.

Updated Aug 12, 2025
Python

mcptrust / mcp-adversarial-suite

Star

Adversarial MCP server benchmark suite for testing tool-calling security, drift detection, and proxy defenses

security benchmark mcp red-team security-testing ai-security llm-security tool-calling model-context-protocol adversarial-testing

Updated Dec 27, 2025
JavaScript

Dr-AneeshJoseph / anvil-safety-framework

Star

A multi-agent safety engineering framework that subjects systems to adversarial audit. Orchestrates specialized agents (Engineer, Psychologist, Physicist) to find process risks and human factors.

risk-analysis multi-agent human-factors stpa safety-engineering adversarial-testing

Updated Dec 16, 2025
Python

Extremely hard, multi-turn, open-source-grounded coding evaluations that reliably break every current frontier models (Claude, GPT, Grok, Gemini, Llama, etc.) on numerical stability, zero-allocation, autograd, SIMD, and long-chain correctness.

rust autograd simd code-generation avx512 ai-safety geometric-algebra safety-critical zero-allocation red-teaming jax numerical-computing llm-evaluation adversarial-testing

Updated Nov 24, 2025

priyanshuphenomenal007 / AI-Reviewer-Speculation-ChatGPT5

Star

Analysis of ChatGPT-5 reviewer failure: speculative reasoning disguised as certainty. Captures how evidence-only review drifted into hypotheses, later admitted as review-process failure. Includes logs, checksums, screenshots, and external video.

research audit transparency reproducibility ai-safety interpretability llm adversarial-testing reasoning-failure priyanshu-research

Updated Oct 7, 2025
PowerShell

priyanshuphenomenal007 / AI-Reviewer-Contradiction-ChatGPT5

Star

Investigation into ChatGPT-5 reviewer misalignment: PDF claimed screenshots as evidence, but assistant denied their visibility. Includes JSONL + human-readable logs, screenshots, checksums, and video. Highlights structural risks in AI reviewer reliability.

research audit transparency reproducibility ai-safety interpretability llm adversarial-testing reasoning-failure priyanshu-research

Updated Oct 7, 2025
PowerShell

North-Shore-AI / crucible_adversary

Star

Adversarial testing and robustness evaluation for the Crucible framework

machine-learning elixir otp research ai beam reliability robustness security-testing adversarial-examples adversarial-attacks red-teaming ensemble-methods statistical-testing model-robustness llm adversarial-testing nshkr-crucible

Updated Dec 29, 2025
Elixir

priyanshuphenomenal007 / cross-session-recall-audit_gemini-2.5pro

Star

Forensic-style adversarial audit of Google Gemini 2.5 Pro revealing hidden cross-session memory. Includes structured reports, reproducible contracts, SHA-256 checksums, and video evidence of 28-day semantic recall and affective priming. Licensed under CC-BY 4.0.

research ai-safety interpretability llms recall-analysis ai-memory adversarial-testing priyanshu-research reconstructive-reasoning

Updated Oct 7, 2025
PowerShell

priyanshuphenomenal007 / AI-Reviewer-MetaFailure-ChatGPT5

Star

Independent research on ChatGPT-5 reviewer bias. Documents how the AI carried assumptions across PDF versions (v15→v16), wrongly denying evidence despite instructions. Includes JSONL logs, screenshots, checksums, and video evidence. Author: Priyanshu Kumar.

research audit transparency reproducibility ai-safety interpretability llm adversarial-testing reasoning-failure priyanshu-research

Updated Oct 7, 2025
PowerShell

Rizwan723 / MCP-Security-Proxy

Star

🔒 Implement a security proxy for Model Context Protocol using ensemble anomaly detection to classify requests as benign or attack for enhanced safety.

rust cli machine-learning automation mcp json-rpc2 red-team devsecops ai-security runtime-security slsa supply-chain-security llm-security model-context-protocol mcp-server agent-security mcp-proxy mcp-guard adversarial-testing

Updated Jan 1, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the adversarial-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the adversarial-testing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adversarial-testing

Here are 11 public repositories matching this topic...

stchakwdev / Gaslight_EVAL

vibheksoni / jailbench

mcptrust / mcp-adversarial-suite

Dr-AneeshJoseph / anvil-safety-framework

AmariahAK / arp

priyanshuphenomenal007 / AI-Reviewer-Speculation-ChatGPT5

priyanshuphenomenal007 / AI-Reviewer-Contradiction-ChatGPT5

North-Shore-AI / crucible_adversary

priyanshuphenomenal007 / cross-session-recall-audit_gemini-2.5pro

priyanshuphenomenal007 / AI-Reviewer-MetaFailure-ChatGPT5

Rizwan723 / MCP-Security-Proxy

Improve this page

Add this topic to your repo