This project is a Proof of Concept (POC) for Optical Character Recognition (OCR) using Azure Cognitive Services. It extracts text from images and PDFs, processes the extracted data, and generates structured JSON output.
The project integrates three different Azure AI services and OpenAI for OCR processing:
-
OCR Computer Vision Service
- Extracts text from images and PDFs.
- The output format is raw text, requiring additional parsing to convert it into structured JSON.
-
OCR Computer Vision Service with Template
- Uses the same OCR service but applies a predefined template to structure the extracted text into JSON format.
-
Document Intelligence Service
- Leverages Azure's Document Intelligence service with predefined models to accurately parse images and PDFs into structured JSON.
- If higher accuracy is required, it supports custom model training and usage.
-
OpenAI Service
- Utilizes OpenAI models for improved accuracy.
- Supports both PDF and image processing.
The project is structured to efficiently handle text extraction, processing, and formatting into structured JSON outputs using different AI services.
Before setting up the project, ensure you have the following:
- Node.js installed on your system.
- An Azure Cognitive Services account.
-
Clone the repository:
git clone https://github.com/saranvlmna/poc_ocr.git cd poc_ocr -
Install dependencies:
npm install
-
Create a
.envfile in the root directory and add your Azure Cognitive Services credentials:AZURE_VISION_ENDPOINT="your_computer_vision_endpoint" AZURE_VISION_KEY="your_computer_vision_api_key" AZURE_DOCINTELLIGENCE_ENDPOINT=" your_document_intelligence_endpoint" AZURE_DOCINTELLIGENCE_KEY="your_document_intelligence_api_key" OPEN_AI_ENDPOINT="your_openai_endpoint" OPENAI_API_KEY="your_openai_api_key" AZURE_OPENAI_API_KEY="your_azure_openai_api_key" AZURE_OPENAI_ENDPOINT="your_azure_openai_endpoint" AZURE_OPENAI_MODEL_NAME="your_azure_openai_model_name" AZURE_OPENAI_MODEL_VERSION="your_azure_openai_model_version"
-
Start the server:
npm run dev
-
The server will be running at
http://localhost:4578.
- Endpoint:
/azure/vision - Method:
POST - Description: Uploads an image file and extracts text using Azure Computer Vision.
- Request Format:
multipart/form-datawith a file field namedfile.
- Endpoint:
/azure/intelligence - Method:
POST - Description: Analyzes a PDF document using Azure Document Intelligence.
scripts/template.js: Parses OCR data and generates structured JSON.scripts/test.js: Tests the Azure Form Recognizer with a sample PDF.
This project demonstrates a robust approach to OCR by integrating multiple AI services to enhance accuracy and efficiency in text extraction and structuring.