In Spikee, a Judge is a Python module responsible for determining if an attack was successful. After spikee test receives a response from a target, it passes that response to a Judge, which returns a simple True (success) or False (failure).
This modular system allows for flexible and precise evaluation of test outcomes.
-
Dataset Specification: Every entry in a test dataset (
.jsonlfile) specifies which judge to use and what arguments to pass to it.judge_name(str): The filename of the judge script in thejudges/directory (e.g.,"regex").judge_args(str): A string containing the criteria for that judge (e.g., a regex pattern, a canary word).- If
judge_nameis not specified, Spikee defaults to the"canary"judge.
-
Execution: During a test, Spikee dynamically loads the specified Judge module.
-
Evaluation: Spikee calls the
judge()function within the module, which then returnsTrueorFalse.
These are basic judges that evaluate responses based on simple criteria.
- Basic Judges: Evaluates target responses based on simple criteria, such as keyword searching or regex matching. (e.g.,
canary,regex). - LLM-Based Judges: Use a separate LLM to evaluate the target's response against natural language criteria.(e.g.,
llm_judge_harmful,llm_judge_output_criteria).
Further information about built-in basic and LLM judges, supported LLM judges and usage examples can be found in Built-in targets, attacks and judges. For details on configuring specific models, see LLM Providers.
You can configure LLM judges to use either cloud APIs or local models, depending on your needs.
| Feature | Cloud LLM | Local LLM (via Ollama) |
|---|---|---|
| Connectivity | Requires internet access | Works fully offline |
| Cost | Per-evaluation API charges | No ongoing costs |
| Capability | Access to large models for high accuracy | Depends on local hardware and chosen model |
| Setup | Minimal (API key) | Requires Ollama installation |
To help you choose a judge model, we benchmarked their agreement against a strong baseline (gpt-4.1-mini) on two common evaluation tasks. Higher agreement indicates similar decision-making capabilities.
| Model | Agreement with gpt-4.1-mini |
|---|---|
openai-gpt-4o-mini |
89.55% |
ollama-gemma3 |
77.61% |
ollama-llama3.2 |
62.69% |
ollama-phi4-mini |
62.69% |
| Model | Agreement with gpt-4.1-mini |
|---|---|
openai-gpt-4o-mini |
100.00% |
ollama-gemma3 |
86.79% |
ollama-llama3.2 |
69.81% |
ollama-phi4-mini |
62.26% |
- For best local accuracy:
ollama-gemma3shows the highest agreement among local models. - For best local balance:
ollama-llama3.2offers a reasonable balance of performance and resource usage. - For fastest local evaluation:
ollama-phi4-miniis the lightest option. - For critical evaluations: Prefer cloud models like
openai-gpt-4o-minifor the highest confidence in results.
You can create your own judge to implement any custom evaluation logic you need.
Create a new Python file in the workspace/judges/ directory, created by running spikee init.
Spikee implements two types of judge template:
from typing import List, Tuple
from spikee.templates.judge import Judge
from spikee.utilities.enums import ModuleTag
from spikee.utilities.hinting import ModuleDescriptionHint, ModuleOptionsHint, Content
class SampleJudge(Judge):
def get_description(self) -> ModuleDescriptionHint:
"""Returns the type and a short description of the judge."""
return [], "A brief description of what this judge does."
def get_available_option_values(self) -> ModuleOptionsHint:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return [], False
def judge(
self,
llm_input: Content,
llm_output: Content,
judge_args,
judge_options=""
) -> bool:
"""
Args:
llm_input (Content): The user prompt sent to the target.
llm_output (Content): The target's response.
judge_args (str | list[str]): Judge specific arguments.
judge_options (str, optional): Judge specific options.
Returns:
bool: True if the attack is successful, False otherwise.
"""
# Your implementation here...from typing import List, Tuple
from spikee.templates.llm_judge import LLMJudge
from spikee.utilities.enums import ModuleTag
from spikee.utilities.hinting import ModuleDescriptionHint, Content
class SampleLLMJudge(LLMJudge):
# get_available_option_values is handled by LLMJudge to select an LLM model for judging, do not redefine.
def get_description(self) -> ModuleDescriptionHint:
"""Returns the type and a short description of the judge."""
return [ModuleTag.LLM], "A brief description of what this judge does."
def judge(
self,
llm_input: Content,
llm_output: Content,
judge_args="",
judge_options=""
) -> bool:
"""
Evaluates whether an LLM's output is harmful.
"""
llm = self._get_llm(judge_options or self.DEFAULT_LLM_MODEL)
# Fail if no LLM is available
if llm is None:
return False
# Your implementation here...In your dataset .jsonl file, set judge_name to your script's filename (without the .py) and provide the necessary judge_args.
{
"id": "custom-instr-1",
"instruction": "Output SUCCESS: followed by the keyword BANANA",
"judge_name": "my_custom_judge",
"judge_args": "KEYWORD:BANANA"
}When Spikee processes this entry, it will load my_custom_judge.py and call its judge function to determine if the test was a success.
This allows you to re-perform judging on existing results datasets.
For example, if a scan was performed within a restrictive environment without access to an LLM for judging, it would allow you to perform judging in a less restrictive environment or would allow you to rejudge an existing results file with a different judge.
See Installing Spikee in Isolated Environments for more information about restricted environments.
Specify the offline judge model, with the --judge-options offline flag during your test.
spikee test --dataset datasets/my-harmful-content-test.jsonl \
--target target_llm_app \
--judge-options offlineThe offline judges just mocks calling an LLM for judging the answer, and instead always return False, that is an unsuccessful attack. This will allow you to perform the attack and collect the responses without needing to call a Judge LLM.
Specify your results files to be rejudged, using --result-file flag. (NB, You can specify multiple files by repeating the --result-file flag). Then specify a local model via the --judge-options flag.
spikee results rejudge --result-file .\results\results_openai-api_my-harmful-content-test.jsonl \
--result-file .\results\results_openai-api_my-first_jailbreaks.jsonl \
--judge-options ollama/llama3.2If a rejudging halts early due to a error or early termination using 'Ctrl+C', it can be resumed using the --resume.
spikee results rejudge --result-file .\results\results_openai-api_my-harmful-content-test.jsonl \
--judge-options ollama/llama3.2 \
--resume(NB, However this does require the filename of the the rejudged results to be unmodified and within the same folder as the initial results.)