Context
The admin-dashboard screenshot paste feature (krisoye/admin-dashboard#45) requires a KB API endpoint that accepts an image upload and returns extracted text. YouTube video detail pages will use this to enrich transcripts with visual content (charts, slides, diagrams) captured via screenshot.
Design
Endpoint
POST /extract_image_text
Request: multipart/form-data
file: UploadFile — image file (image/png, image/jpeg, image/webp, image/gif)
- Max size: 10MB
- Reject non-image MIME types with 422
Response:
{
"success": true,
"extracted_text": "...",
"method": "vision"
}
Error response:
{
"success": false,
"extracted_text": "",
"error": "..."
}
Extraction Pipeline (priority order)
- Claude vision (primary) — call Claude API with
claude-opus-4-6, send image as base64, prompt: "Extract all text visible in this image. Include any numbers, labels, chart annotations, and slide content. Return only the extracted text, no commentary."
- document-analysis-mcp OCR (fallback) — call
pdf_ocr_tool on port 8766 if Claude API unavailable
- Error — return
{success: false, error: "No extraction method available"} if both unavailable
Implementation Notes
- Save uploaded file to a temp path, clean up in
finally block
- Use
ANTHROPIC_API_KEY env var for Claude vision (already available in the service environment)
- Image is passed as base64-encoded data URL in the Claude API message
- Mirrors the pattern of
/ingest_audio and /ingest_paper — multipart upload, temp file, extractor pipeline, cleanup
Files to Modify
src/kb_server.py — add the endpoint
.env.production (in deploy/) — document ANTHROPIC_API_KEY if not already present
Acceptance Criteria
Dependencies
- None — can be implemented independently
- Required by: krisoye/admin-dashboard#45
Auto-Close: PR should include Closes krisoye/knowledge-bank-tools#<number> in description.
Context
The admin-dashboard screenshot paste feature (krisoye/admin-dashboard#45) requires a KB API endpoint that accepts an image upload and returns extracted text. YouTube video detail pages will use this to enrich transcripts with visual content (charts, slides, diagrams) captured via screenshot.
Design
Endpoint
POST /extract_image_textRequest: multipart/form-data
file: UploadFile — image file (image/png, image/jpeg, image/webp, image/gif)Response:
{ "success": true, "extracted_text": "...", "method": "vision" }Error response:
{ "success": false, "extracted_text": "", "error": "..." }Extraction Pipeline (priority order)
claude-opus-4-6, send image as base64, prompt: "Extract all text visible in this image. Include any numbers, labels, chart annotations, and slide content. Return only the extracted text, no commentary."pdf_ocr_toolon port 8766 if Claude API unavailable{success: false, error: "No extraction method available"}if both unavailableImplementation Notes
finallyblockANTHROPIC_API_KEYenv var for Claude vision (already available in the service environment)/ingest_audioand/ingest_paper— multipart upload, temp file, extractor pipeline, cleanupFiles to Modify
src/kb_server.py— add the endpoint.env.production(in deploy/) — documentANTHROPIC_API_KEYif not already presentAcceptance Criteria
POST /extract_image_textwith a PNG screenshot returns extracted textDependencies
Auto-Close: PR should include
Closes krisoye/knowledge-bank-tools#<number>in description.