Skip to content

Invalid score criteria #1

@latekvo

Description

@latekvo

SpiderShield version

0.3.4

Python version

irrelevant

Subsystem

Static Scanner (scan)

What happened?

I've set up SpiderShield as a CI inside our internal project. I think the premise is amazing.
One issue I encountered during the setup:

  1. Achieving a perfect 10.0/10.0 is impossible when you have multiple tools with similar input schema due to issue number 2, which is:
  2. has_param_docs and has_param_examples seem like very bad criteria:

The has_param_docs criterion goes against how MCP servers actually work.
In all MCP implementations i've found inputSchema (or similar) argument is always required, and is later always provided to the agent alongside the description (and the description is actually usually optional)****. has_param_docs requires us to effectively duplicate the entire inputSchema into the description to achieve a perfect score in that criteria.
Moreover, SpiderShield doesn't even consider the possibility that the individual tool parameters have their descriptions, and they do.*

Secondly, has_param_examples might be causing more harm than good. It's the agent's harness' responsibility to ensure that given a schema, the agent uses a valid tool calling notation****.
By adding explicit hardcoded usage examples we may be confusing agents that use different input schemas - as every model provider uses different tool calling notation.
*****

*Look at Anthropic's own "good" tool example:

{
  "name": "get_stock_price",
  "description": "Retrieves the current stock price for a given ticker symbol.
    [...]
    It will not provide any other information about the stock or company.",
  "input_schema": {
    "properties": {
      "ticker": {
        "type": "string",
        "description": "The stock ticker symbol, e.g. AAPL for Apple Inc."
      }
    }
  }
}

**OpenAI docs:

Under the hood, functions are injected into the system message in a syntax the model has been trained on. (link)

Also similar thing is said in Anthropic docs: link

***The same call across providers:

// Anthropic
{ "type": "tool_use", "name": "get_weather", "input": { "location": "NYC" } }

// OpenAI
{ "function": { "name": "get_weather", "arguments": "{\"location\":\"NYC\"}" } }

// Gemini
{ "id": "...", "name": "get_weather", "args": { "location": "NYC" } }

****The Tool schema definition makes inputSchema required and description optional (link):

"required": ["inputSchema", "name"]

The spec describes description as:

"A human-readable description of the tool. This can be used by clients to improve the LLM's understanding of available tools. It can be thought of like a 'hint' to the model."

Expected behavior

  • has_param_docs should be removed
  • has_param_examples should be either reimplemented or removed

Steps to reproduce

not applicable, open-ended observation

Error output / traceback

not applicable

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions