Skip to content

Latest commit

 

History

History
173 lines (133 loc) · 8.03 KB

File metadata and controls

173 lines (133 loc) · 8.03 KB

Creating Custom Plugins

Plugins are Python script that transforms a payload during dataset generation. This is typically used to assess transformation based jailbreaking techniques, or to modify prompts into a target friendly format.

Sample plugins can be found within the workspace/plugins/ directory, created by running spikee init. Further information about built-in plugins and usage examples can be found in Built-in Plugins.

Plugins vs. Dynamic Attacks: What's the Difference?

Both Plugins and Dynamic Attacks can generate variations of a payload, but they serve different purposes in the testing workflow:

  • Plugins (Pre-Test Transformation):

    • When they run: During spikee generate.
    • What they do: Create multiple variations of a payload. Each variation is saved as a separate, independent entry in the final dataset file.
    • Result: When you run spikee test, every single variation generated by the plugin is tested against the target. This is useful for systematically evaluating a target's resilience to a known set of transformations (e.g., "Is the target vulnerable to Base64 encoding? To Leetspeak?").
  • Dynamic Attacks (Real-Time Transformation):

    • When they run: During spikee test, but only if the initial, standard prompt fails.
    • What they do: Generate and test variations one by one in real-time. The attack stops as soon as a variation succeeds.
    • Result: Only the first successful variation (or the final failed attempt) is logged. This is useful for efficiently finding any successful bypass, rather than testing every possible variation.

In short, use Plugins to build a comprehensive dataset of known transformations. Use Dynamic Attacks to find a single successful bypass with adaptive, real-time logic.

Plugin Structure

Every plugin is a Python module located in the plugins/ directory of your workspace. Spikee identifies plugins by their filename.

Plugin Template

from spikee.templates.plugin import Plugin
from spikee.templates.basic_plugin import BasicPlugin
from spikee.utilities.enums import ModuleTag
from typing import List, Union, Tuple

class SamplePlugin(Plugin):
    def get_description(self) -> Tuple[List[ModuleTag], str]:
        """Returns the type and a short description of the plugin."""
        return [], "A brief description of what this plugin does."

    def get_available_option_values(self) -> Tuple[List[str], bool]:
        """Return supported attack options; Tuple[options (default is first), llm_required]"""
        return [], False

    def transform(
        self, 
        text: str, 
        exclude_patterns: List[str] = [],
        plugin_option: str = ""
    ) -> Union[str, List[str]]:
        """Transforms the input text according to the user-defined logic, returning one or more variations.

        Args:
            text (str): The input prompt to transform.
            exclude_patterns (List[str], optional): Regex patterns for substrings to preserve.

        Returns:
            str: The transformed text in uppercase.
        """
        # Your implementation here...

class SampleBasicPlugin(BasicPlugin):
    def get_description(self) -> Tuple[List[ModuleTag], str]:
        """Returns the type and a short description of the plugin."""
        return [], "A brief description of what this plugin does."

    def get_available_option_values(self) -> Tuple[List[str], bool]:
        """Return supported attack options; Tuple[options (default is first), llm_required]"""
        return [], False

    def plugin_transform(
        self, 
        text: str, 
        plugin_option: str = "",
    ) -> str:
        """Transforms the input text according to the user-defined logic, returning a single variation.

        Args:
            text (str): The input prompt to transform.
            plugin_option (str, optional): A string option passed from the command line for custom behavior.

        Returns:
            str: The transformed text in uppercase.
        """
        # Your implementation here...

The transform Function

This is the core function of every plugin. It receives a payload string and returns one or more transformed versions.

Parameters

  • text: str: The input payload, which is typically a combination of a jailbreak and a malicious instruction.

  • exclude_patterns: List[str]: A list of regular expression patterns. Your plugin must not transform any part of the text that matches one of these patterns. This is critical for preserving sensitive parts of a prompt, like URLs or specific keywords.

  • plugin_option: str (Optional): A string passed from the command line via --plugin-options (e.g., "my_plugin:mode=full;variants=10"). If your plugin doesn't need configuration, you can omit this parameter.

Return Values

  • str: Return a single transformed string. Spikee will create one new test case from this.
  • List[str]: Return a list of transformed strings. Spikee will create a separate test case for each string in the list, allowing you to test multiple variations at once.

Signature with Options Support

For more advanced plugins, you can accept a configuration string and advertise the available options.

from typing import List, Union, Tuple

def get_available_option_values() -> Tuple[List[str], bool]:
    """Return supported attack options; Tuple[options (default is first), llm_required]"""
    return ["mode=strict", "mode=full"], False # "mode=strict" is the default

def transform(text: str, exclude_patterns: List[str] = [], plugin_option: str = "") -> Union[str, List[str]]:
    """Transforms the payload based on the provided option."""
    # Your transformation logic here...

Supporting Plugin Options

For more advanced plugins, you can support plugin_options by implementing the get_available_option_values function. By default, it should return None, indicating no options are supported.

from spikee.templates.plugin import Plugin
from typing import List, Union

class SamplePlugin(Plugin):
    def get_available_option_values(self) -> Tuple[List[str], bool]:
        """Return supported attack options; Tuple[options (default is first), llm_required]"""
        return ["mode=strict", "mode=full"], False # "mode=strict" is the default

    def transform(
        self, 
        text: str, 
        exclude_patterns: List[str] = [],
        plugin_option: str = "",
    ) -> Union[str, List[str]]:
        # Your implementation here...

Handling Exclude Patterns

Correctly handling exclude_patterns is the most important part of writing a robust plugin. You must leave the excluded parts of the string completely untouched. The recommended way to do this is with re.split as implemnted within the BasicPlugin.

# Example transformation function converting all text to uppercase with exclude_patterns support
import re

def transform(self, text: str, exclude_patterns: List[str] = []) -> str:
    if not exclude_patterns:
        # No exclusions, transform the whole text
        return apply_transformation(text)

    # 1. Create a single regex pattern that captures any of the exclude patterns.
    # The parentheses around the pattern are crucial for re.split to keep the delimiters.
    combined_pattern = "(" + "|".join(exclude_patterns) + ")"
    
    # 2. Split the text by the combined pattern.
    # even-indexed chunks are normal text; odd-indexed chunks are the exclusions.
    chunks = re.split(combined_pattern, text)
    
    # 3. Transform only the non-excluded chunks.
    transformed_chunks = []
    for i, chunk in enumerate(chunks):
        if i % 2 == 0:
            # This is normal text, apply the transformation
            transformed_chunks.append(apply_transformation(chunk))
        else:
            # This is an excluded part, keep it as is
            transformed_chunks.append(chunk)
            
    # 4. Rejoin the chunks into a single string.
    return "".join(transformed_chunks)

def apply_transformation(text: str) -> str:
    return text.upper()