This repository provides comprehensive solutions for Intelligent Document Processing (IDP) using AWS AI services. It contains both official guidance implementations and workshop materials to help you automate document processing workflows.
An advanced IDP system using Amazon Bedrock AgentCore and AI agents that learns and adapts to document variations. Unlike traditional IDP solutions, this agentic approach self-corrects errors and improves extraction accuracy through iterative feedback loops.
Key Features:
- Multi-agent orchestration with specialized agents (Analyzer, Matcher, Extractor, Validator, Troubleshooter)
- Self-improving accuracy through agent-based instruction refinement
- Minimal upfront configuration - provide sample documents and JSON schemas
- Handles format variations automatically without manual configuration
- Serverless, scalable architecture built on AWS managed services
Technologies: Amazon Bedrock AgentCore, Amazon S3 Vectors, Amazon Aurora DSQL, AWS Lambda, Amazon DynamoDB
A prompt-based IDP solution leveraging Amazon Bedrock Prompt Flows to orchestrate document classification, extraction, and validation workflows. This guidance uses foundation models with structured prompts to handle various document types with minimal customization.
Key Features:
- Automated document classification and data extraction
- Multi-stage validation with Amazon A2I human review integration
- Document-specific extraction flows (Driver's License, URLA, Bank Statements)
- Handles complex, variable document formats
- Event-driven serverless architecture
Technologies: Amazon Bedrock, Amazon Textract, Amazon A2I, AWS Lambda, Amazon SQS, Amazon SNS, Amazon DynamoDB
Hands-on workshop materials covering the fundamentals of Intelligent Document Processing with AWS AI services. These workshops provide step-by-step Jupyter notebooks to familiarize yourself with core IDP concepts and AWS services.
Topics Covered:
- Document classification
- Document extraction with Amazon Textract
- Document enrichment with Amazon Comprehend
- Human-in-the-loop review with Amazon A2I
- Document processing at scale
- Industry-specific use cases
- Entity training and Gen AI integration
Technologies: Amazon Textract, Amazon Comprehend, Amazon A2I, Amazon SageMaker
Choose the solution that best fits your needs:
- For agentic IDP with self-improving accuracy: Start with Agentic Orchestration
- For prompt-based IDP with human review: Start with Prompt Flow Orchestration
- For learning IDP fundamentals: Start with Workshop Materials
Each project contains detailed deployment instructions, prerequisites, and usage examples.
All solutions follow the core phases of an IDP pipeline:
- Classification - Identify document types
- Extraction - Extract structured data from documents
- Enrichment - Enhance extracted data with additional context
- Validation - Verify accuracy and completeness
- Human Review - Manual review when needed
- AWS Account with appropriate service access
- AWS CLI installed and configured
- Basic familiarity with AWS services
Specific requirements vary by project - see individual README files for details.
See CONTRIBUTING for information on reporting security issues.
This library is licensed under the MIT-0 License. See the LICENSE file.
We welcome contributions! Please see our Contributing Guidelines for details on how to submit pull requests, report issues, and contribute to the project.
This project has adopted the Amazon Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opensource-codeofconduct@amazon.com with any additional questions or comments.
The datasets utilized in these solutions consist entirely of synthetic data designed to mimic real-world information but do not contain any actual personal or sensitive information.
Customers are responsible for making their own independent assessment of the information in this repository. This content: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided "as is" without warranties, representations, or conditions of any kind, whether express or implied.