Skip to content

Google Provider can support images being returned from tool calls #910

@jcheng5

Description

@jcheng5

Below are some research notes gleaned from pointing Claude Code at Google's Python SDK (I couldn't find a helpful reference).

Basic JSON Structure

  1. Function Response with Inline Image (base64-encoded bytes)
{
  "name": "test",
  "response": {
    "temp": 14.5
  },
  "parts": [
    {
      "inline_data": {
        "mime_type": "image/png",
        "data": "dGVzdDEyMw=="
      }
    }
  ]
}
  1. Function Response with URI-based Image
{
  "name": "test",
  "response": {
    "temp": 14.5
  },
  "parts": [
    {
      "file_data": {
        "file_uri": "gs://bucket/image.jpg",
        "mime_type": "image/jpeg"
      }
    }
  ]
}
  1. Function Response with Multiple Media Parts
{
  "name": "generate_chart",
  "response": {
    "status": "success"
  },
  "parts": [
    {
      "inline_data": {
        "mime_type": "image/png",
        "data": "Y2hhcnRfaW1hZ2VfZGF0YQ=="
      }
    },
    {
      "file_data": {
        "file_uri": "gs://bucket/chart.pdf",
        "mime_type": "application/pdf"
      }
    }
  ]
}
  1. Function Response with Only Parts (no response field)
{
  "name": "test",
  "parts": [
    {
      "inline_data": {
        "mime_type": "image/png",
        "data": "dGVzdDEyMw=="
      }
    }
  ]
}

When Used in a Content Message (Part)

When returning this as a content part:

{
  "function_response": {
    "name": "test",
    "response": {
      "result": "ok"
    },
    "parts": [
      {
        "inline_data": {
          "mime_type": "image/png",
          "data": "dGVzdDEyMw=="
        }
      }
    ]
  }
}

JSON Schema

{
  "FunctionResponse": {
    "type": "object",
    "properties": {
      "name": {
        "type": "string",
        "description": "Required. The name of the function"
      },
      "id": {
        "type": "string",
        "description": "Optional. The id of the function call (required for Gemini API, not Vertex)"
      },
      "response": {
        "type": "object",
        "description": "Optional. The function response in JSON object format"
      },
      "parts": {
        "type": "array",
        "description": "Optional. List of media parts",
        "items": {
          "$ref": "#/FunctionResponsePart"
        }
      },
      "will_continue": {
        "type": "boolean",
        "description": "Optional. For non-blocking function calls"
      },
      "scheduling": {
        "type": "string",
        "enum": ["SCHEDULING_UNSPECIFIED", "SILENT", "WHEN_IDLE", "INTERRUPT"],
        "description": "Optional. How the response should be scheduled"
      }
    },
    "required": ["name"]
  },
  "FunctionResponsePart": {
    "type": "object",
    "description": "One of inline_data or file_data must be present",
    "properties": {
      "inline_data": {
        "$ref": "#/FunctionResponseBlob"
      },
      "file_data": {
        "$ref": "#/FunctionResponseFileData"
      }
    }
  },
  "FunctionResponseBlob": {
    "type": "object",
    "properties": {
      "mime_type": {
        "type": "string",
        "description": "Required. MIME type (e.g., 'image/png')"
      },
      "data": {
        "type": "string",
        "description": "Required. Base64-encoded bytes",
        "format": "base64"
      },
      "display_name": {
        "type": "string",
        "description": "Optional. Display name"
      }
    },
    "required": ["mime_type", "data"]
  },
  "FunctionResponseFileData": {
    "type": "object",
    "properties": {
      "file_uri": {
        "type": "string",
        "description": "Required. URI (e.g., 'gs://bucket/file.jpg')"
      },
      "mime_type": {
        "type": "string",
        "description": "Required. MIME type"
      },
      "display_name": {
        "type": "string",
        "description": "Optional. Display name"
      }
    },
    "required": ["file_uri", "mime_type"]
  }
}

Key Points

  1. Both response and parts are optional, but you typically include at least one
  2. Each part can be either inline_data OR file_data, not both
  3. Multiple parts allowed - you can return multiple images or mixed media types

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions