A comprehensive application for comparing OCR (Optical Character Recognition) capabilities across multiple AWS AI services: Amazon Textract, Amazon Bedrock, and Amazon Bedrock Data Automation (BDA).
This application provides a unified interface for extracting text and structured data from images using three different AWS AI services:
- Amazon Textract: AWS's dedicated OCR service for extracting text, forms, and tables from documents
- After calling the Amazon Textract API to extract text, the application uses LLM to structure the extracted data into JSON format according to the provided schema.
- Amazon Bedrock: Using foundation models like Claude for document understanding and extraction
- Uses foundation models directly for both extraction and JSON structuring in a single step.
- Amazon Bedrock Data Automation (BDA): AWS's specialized image/document analysis service
- Custom Blueprint Method: Creates a custom document processing blueprint based on the provided JSON schema
- LLM Post-processing Method: Uses standard BDA extraction followed by Bedrock LLM to structure the data (default method)
The application enables side-by-side comparison of these services' accuracy, cost, and processing time across different document types, helping you choose the optimal service for your specific OCR needs.
- Multi-Engine OCR Processing: Process the same document with Textract, Bedrock, and BDA simultaneously
- Interactive UI: User-friendly interface for testing and comparing OCR engines
- Performance Comparison: Side-by-side comparison of extraction quality, processing time, and cost
- Accuracy Evaluation: Compare extracted data against ground truth for objective evaluation
- JSON Schema Support: Structure extracted data according to custom schemas
- Cost Calculation: Real-time cost estimation for each service
- Batch Processing: Process multiple sample documents at once
- Result Visualization: Visual annotation of detected text elements
The application follows a modular architecture with several key components:
- Engine Implementations: Separate modules for each AWS service (Textract, Bedrock, BDA)
- User Interface: Gradio-based UI for interactive testing and result visualization
- Core Processing: Parallel execution of OCR engines with standardized result handling
- Sample Management: Utilities for working with test documents and sample data
- Evaluation Tools: Components for accuracy assessment and comparison
- Python 3.10+
- AWS Account with access to:
- Amazon Textract
- Amazon Bedrock (with access to supported models)
- Amazon Bedrock Data Autmation
- AWS credentials configured locally
- S3 bucket for BDA processing
-
Clone the repository:
git clone https://github.com/aws-samples/ocr-with-aws-ai-services.git cd ocr-with-aws-ai-services -
Install required packages:
pip install -r requirements.txt -
Configure AWS credentials using one of the following methods:
- AWS CLI:
aws configure - Environment variables
- Credentials file (~/.aws/credentials)
- AWS CLI:
Run the application with:
python app.py
This will start the Gradio web interface, typically accessible at http://localhost:7860 (or the URL displayed in your terminal).
-
Select or Upload an Image:
- Choose from sample images in the dropdown, or
- Upload your own image using the upload control
-
Select OCR Engines:
- Choose one or more OCR engines to use (Textract, Bedrock, BDA)
- Configure engine-specific options as needed
-
Set Processing Options:
- Document type (generic, form, receipt, table, handwritten)
- Output JSON schema (for structured data extraction)
- Model selection for Bedrock
- S3 bucket and blueprint options for BDA
-
Process the Image:
- Click "Process Sample" to analyze the current image
- Click "Process All Samples" to batch process all sample images
-
View Results:
- Navigate between tabs to see results from each engine
- Compare extracted text, structured JSON, and annotated images
- View performance metrics including processing time, cost, and accuracy
- Use the "Compare" tab to see detailed comparison with ground truth
Modify these settings as needed for your environment.
The repository includes sample documents in the sample/ directory:
sample/images/: Test images for OCR processingsample/schema/: JSON schemas for structured extractionsample/truth/: Ground truth data for accuracy evaluation
To add your own samples:
- Add images to
sample/images/ - (Optional) Add corresponding JSON schemas with the same base filename in
sample/schema/ - (Optional) Add ground truth data with the same base filename in
sample/truth/
Processing results are stored in the results/ directory, organized by processing run and sample. Each run includes:
- Extracted text
- JSON structured data
- Annotated visualization images
- Processing metadata (time, cost, accuracy)
- Summary reports
This project is licensed under the MIT License - see the LICENSE file for details.
- AWS AI Services teams for providing the underlying OCR capabilities
- Contributors to the Gradio framework for the UI components
