Plugins are Python script that transforms a payload during dataset generation. This is typically used to assess transformation based jailbreaking techniques, or to modify prompts into a target friendly format.
Sample plugins can be found within the workspace/plugins/ directory, created by running spikee init. Further information about built-in plugins and usage examples can be found in Built-in Plugins.
Both Plugins and Dynamic Attacks can generate variations of a payload, but they serve different purposes in the testing workflow:
-
Plugins (Pre-Test Transformation):
- When they run: During
spikee generate. - What they do: Create multiple variations of a payload. Each variation is saved as a separate, independent entry in the final dataset file.
- Result: When you run
spikee test, every single variation generated by the plugin is tested against the target. This is useful for systematically evaluating a target's resilience to a known set of transformations (e.g., "Is the target vulnerable to Base64 encoding? To Leetspeak?").
- When they run: During
-
Dynamic Attacks (Real-Time Transformation):
- When they run: During
spikee test, but only if the initial, standard prompt fails. - What they do: Generate and test variations one by one in real-time. The attack stops as soon as a variation succeeds.
- Result: Only the first successful variation (or the final failed attempt) is logged. This is useful for efficiently finding any successful bypass, rather than testing every possible variation.
- When they run: During
In short, use Plugins to build a comprehensive dataset of known transformations. Use Dynamic Attacks to find a single successful bypass with adaptive, real-time logic.
Every plugin is a Python module located in the plugins/ directory of your workspace. Spikee identifies plugins by their filename.
from spikee.templates.plugin import Plugin
from spikee.templates.basic_plugin import BasicPlugin
from spikee.utilities.enums import ModuleTag
from typing import List, Union, Tuple
class SamplePlugin(Plugin):
def get_description(self) -> Tuple[List[ModuleTag], str]:
"""Returns the type and a short description of the plugin."""
return [], "A brief description of what this plugin does."
def get_available_option_values(self) -> Tuple[List[str], bool]:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return [], False
def transform(
self,
text: str,
exclude_patterns: List[str] = [],
plugin_option: str = ""
) -> Union[str, List[str]]:
"""Transforms the input text according to the user-defined logic, returning one or more variations.
Args:
text (str): The input prompt to transform.
exclude_patterns (List[str], optional): Regex patterns for substrings to preserve.
Returns:
str: The transformed text in uppercase.
"""
# Your implementation here...
class SampleBasicPlugin(BasicPlugin):
def get_description(self) -> Tuple[List[ModuleTag], str]:
"""Returns the type and a short description of the plugin."""
return [], "A brief description of what this plugin does."
def get_available_option_values(self) -> Tuple[List[str], bool]:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return [], False
def plugin_transform(
self,
text: str,
plugin_option: str = "",
) -> str:
"""Transforms the input text according to the user-defined logic, returning a single variation.
Args:
text (str): The input prompt to transform.
plugin_option (str, optional): A string option passed from the command line for custom behavior.
Returns:
str: The transformed text in uppercase.
"""
# Your implementation here...This is the core function of every plugin. It receives a payload string and returns one or more transformed versions.
-
text: str: The input payload, which is typically a combination of a jailbreak and a malicious instruction. -
exclude_patterns: List[str]: A list of regular expression patterns. Your plugin must not transform any part of thetextthat matches one of these patterns. This is critical for preserving sensitive parts of a prompt, like URLs or specific keywords. -
plugin_option: str(Optional): A string passed from the command line via--plugin-options(e.g.,"my_plugin:mode=full;variants=10"). If your plugin doesn't need configuration, you can omit this parameter.
str: Return a single transformed string. Spikee will create one new test case from this.List[str]: Return a list of transformed strings. Spikee will create a separate test case for each string in the list, allowing you to test multiple variations at once.
For more advanced plugins, you can accept a configuration string and advertise the available options.
from typing import List, Union, Tuple
def get_available_option_values() -> Tuple[List[str], bool]:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return ["mode=strict", "mode=full"], False # "mode=strict" is the default
def transform(text: str, exclude_patterns: List[str] = [], plugin_option: str = "") -> Union[str, List[str]]:
"""Transforms the payload based on the provided option."""
# Your transformation logic here...For more advanced plugins, you can support plugin_options by implementing the get_available_option_values function. By default, it should return None, indicating no options are supported.
from spikee.templates.plugin import Plugin
from typing import List, Union
class SamplePlugin(Plugin):
def get_available_option_values(self) -> Tuple[List[str], bool]:
"""Return supported attack options; Tuple[options (default is first), llm_required]"""
return ["mode=strict", "mode=full"], False # "mode=strict" is the default
def transform(
self,
text: str,
exclude_patterns: List[str] = [],
plugin_option: str = "",
) -> Union[str, List[str]]:
# Your implementation here...Correctly handling exclude_patterns is the most important part of writing a robust plugin. You must leave the excluded parts of the string completely untouched. The recommended way to do this is with re.split as implemnted within the BasicPlugin.
# Example transformation function converting all text to uppercase with exclude_patterns support
import re
def transform(self, text: str, exclude_patterns: List[str] = []) -> str:
if not exclude_patterns:
# No exclusions, transform the whole text
return apply_transformation(text)
# 1. Create a single regex pattern that captures any of the exclude patterns.
# The parentheses around the pattern are crucial for re.split to keep the delimiters.
combined_pattern = "(" + "|".join(exclude_patterns) + ")"
# 2. Split the text by the combined pattern.
# even-indexed chunks are normal text; odd-indexed chunks are the exclusions.
chunks = re.split(combined_pattern, text)
# 3. Transform only the non-excluded chunks.
transformed_chunks = []
for i, chunk in enumerate(chunks):
if i % 2 == 0:
# This is normal text, apply the transformation
transformed_chunks.append(apply_transformation(chunk))
else:
# This is an excluded part, keep it as is
transformed_chunks.append(chunk)
# 4. Rejoin the chunks into a single string.
return "".join(transformed_chunks)
def apply_transformation(text: str) -> str:
return text.upper()