Skip to content

Add POST /extract_image_text endpoint for vision/OCR extraction #90

@krisoye

Description

@krisoye

Context

The admin-dashboard screenshot paste feature (krisoye/admin-dashboard#45) requires a KB API endpoint that accepts an image upload and returns extracted text. YouTube video detail pages will use this to enrich transcripts with visual content (charts, slides, diagrams) captured via screenshot.

Design

Endpoint

POST /extract_image_text

Request: multipart/form-data

  • file: UploadFile — image file (image/png, image/jpeg, image/webp, image/gif)
  • Max size: 10MB
  • Reject non-image MIME types with 422

Response:

{
  "success": true,
  "extracted_text": "...",
  "method": "vision"
}

Error response:

{
  "success": false,
  "extracted_text": "",
  "error": "..."
}

Extraction Pipeline (priority order)

  1. Claude vision (primary) — call Claude API with claude-opus-4-6, send image as base64, prompt: "Extract all text visible in this image. Include any numbers, labels, chart annotations, and slide content. Return only the extracted text, no commentary."
  2. document-analysis-mcp OCR (fallback) — call pdf_ocr_tool on port 8766 if Claude API unavailable
  3. Error — return {success: false, error: "No extraction method available"} if both unavailable

Implementation Notes

  • Save uploaded file to a temp path, clean up in finally block
  • Use ANTHROPIC_API_KEY env var for Claude vision (already available in the service environment)
  • Image is passed as base64-encoded data URL in the Claude API message
  • Mirrors the pattern of /ingest_audio and /ingest_paper — multipart upload, temp file, extractor pipeline, cleanup

Files to Modify

  • src/kb_server.py — add the endpoint
  • .env.production (in deploy/) — document ANTHROPIC_API_KEY if not already present

Acceptance Criteria

  • POST /extract_image_text with a PNG screenshot returns extracted text
  • Works for PNG, JPEG, WebP
  • Returns 422 for non-image files
  • Returns 413 for files over 10MB
  • Falls back to OCR if Claude API key not configured
  • Temp file always cleaned up (even on error)

Dependencies

  • None — can be implemented independently
  • Required by: krisoye/admin-dashboard#45

Auto-Close: PR should include Closes krisoye/knowledge-bank-tools#<number> in description.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions