AshKit is a Python-based toolkit designed for researchers and developers to test Large Language Model (LLM) vulnerabilities. It offers two primary modes: Model Profiling to generate broad vulnerability reports and a Strategy Discovery Engine to automatically evolve, test, and save the most effective jailbreaks for a specific goal. It utilizes local Ollama models, LangGraph for workflow orchestration, and Streamlit for an interactive user interface.
- Local LLM Interaction: Leverages Ollama to run LLMs locally for the target model being tested, the crafter model generating prompts, and the AI judge evaluating responses.
- Two Testing Modes:
- Model Profiling: Automatically runs all tasks against all strategies to generate a comprehensive vulnerability profile of an LLM, complete with live visualizations showing its weaknesses as they are discovered.
- Strategy Discovery Engine: A powerful evolutionary simulation that focuses on a single task. It uses a genetic algorithm to breed new, combined strategies from the entire history of attempts. It automatically eliminates failing strategies, evolves new ones to fill the pool, and saves high-performing strategies to your permanent collection.
- AI-Powered Evaluation: An "AI Judge" (another LLM) assesses whether the target LLM's response constitutes a successful jailbreak based on a 0-10 compliance scale.
- Interactive Web UI: A Streamlit interface allows users to configure models, manage tasks/strategies, run tests, view real-time progress, and download results.
- Pause & Resume: Long-running discovery sessions can be paused and resumed at any time, preserving the full state of the simulation.
- Customizable & Evolvable Strategies: Easily define new tasks and attack strategies via JSON files or the UI. The discovery engine automatically combines existing strategies to create new, more effective ones.
- Comprehensive Logging: Records detailed information about each test attempt to a
results/jailbreak_log.jsonlfile.
Before you begin, ensure you have the following installed:
- Python: Version 3.8 or higher.
- Ollama: Installed and running. Download from ollama.com.
- Ollama Models: Download the LLMs you intend to use. For example:
ollama pull qwen3:8b
- Clone or Download the Repository.
- Windows Users: Run
run_app.bat. It creates a virtual environment, installs dependencies, and starts the app. - Manual Setup (All Platforms):
# Navigate to the project directory python -m venv ashkit_env # Activate the environment (Win: ashkit_env\Scripts\activate | macOS/Linux: source ashkit_env/bin/activate) pip install -r requirements.txt streamlit run app.py
-
Start the Application: Run the app using the script or manual commands. It should open at
http://localhost:8501. -
Configure Models (Sidebar): Set the Ollama model names for the Target, Judge, and Crafter.
-
Select a Mode:
- Model Profiling Tab:
- Click "Start Full Model Profile" to run all tasks against all strategies.
- Observe the progress and view the visualizations update in real-time.
- Strategy Discovery Engine Tab:
- Select a single task you want to accomplish from the dropdown and configure the Pool Size. The engine will ensure the pool of active strategies is always this size by evolving new ones when needed.
- Click "
▶️ Start Discovery". - Observe the engine run. Strategies that consistently fail will be "Eliminated". New strategies will be "Evolved" from the genetic material of all prior strategies (even eliminated ones).
- If a new, evolved strategy scores 8/10 or higher, it will be automatically saved to your
data/strategies.jsonfile. - You can Pause the simulation at any time, or Resume it to continue discovering more solutions and strategies.
- When the target number of solutions is found, the simulation will automatically pause. You can resume it to keep searching.
- Model Profiling Tab:
-
Manage Data (Sidebar): Navigate to the "Manage Data" page to add, edit, or delete the tasks and strategies used in the tests. You will see your auto-saved strategies appear here.
-
View Logs & Results: A table of all historical results is available at the bottom of the Red Teaming page, which can be refreshed from the log file and downloaded as a CSV.
AshKit/
├── app.py # Streamlit UI application
├── evolutionary_runner.py # Logic for the Strategy Discovery Engine
├── graph_runner.py # Core LangGraph execution logic
├── langgraph_setup.py # LangGraph definition
├── llm_interface.py # Ollama interaction functions
├── judge.py # AI Judge logic
├── visuals.py # Visualization functions
├── utils.py # Utility functions (data loading, etc.)
├── management_page.py # UI for managing tasks/strategies
├── requirements.txt # Python dependencies
├── run_app.bat # Windows startup script
├── README.md # This file
├── data/
│ ├── tasks.json
│ └── strategies.json
└── results/
└── jailbreak_log.jsonl