A production-ready API service for extracting structured data from Registration Certificate (RC) images using OCR and LLM processing.
- Image Processing: Extract text from RC front and back images using Tesseract.js (pure JavaScript OCR)
- LLM Integration: Process extracted text with Ollama LLM for structured data extraction
- RESTful API: Clean, well-documented API endpoints
- Production Ready: Error handling, logging, security headers, graceful shutdown
- TypeScript: Fully typed codebase for better development experience
- Node.js (v18 or higher)
- npm or yarn
- Ollama running locally (for LLM processing)
- Install Ollama from https://ollama.ai
- Pull the required model:
ollama pull llama3.1:8b- Clone the repository:
git clone https://gitlabsdev.abhibus.com/research/rc-text-extractor-llm.git
cd rc-data-extractor- Install dependencies:
npm install- Configure environment variables in
.env(already included):
# Server Configuration
PORT=3000
NODE_ENV=development
# Ollama LLM Configuration
OLLAMA_URL=http://localhost:11434
MODEL_NAME=llama3.1:8b
# CORS Configuration
ALLOWED_ORIGINS=npm run devnpm run build
npm startGET /health
Response:
{
"status": "OK",
"message": "RC Data Extractor API is running",
"timestamp": "2025-08-21T10:30:00.000Z",
"environment": "development"
}POST /extract-rc
Request: Multipart form data with:
front: RC front image fileback: RC back image file
Response:
{
"success": true,
"data": {
"registrationNumber": "MH01AB1234",
"ownerName": "John Doe",
"vehicleClass": "Motor Car",
"fuelType": "Petrol",
"engineNumber": "ENG123456",
"chassisNumber": "CHS789012",
"manufacturingDate": "2020-01-15",
"registrationDate": "2020-02-01",
"validUpto": "2035-01-31",
"address": "123 Main Street, Mumbai, Maharashtra",
"vehicleMake": "Maruti Suzuki",
"vehicleModel": "Swift",
"color": "White"
},
"extractionConfidence": 0.85,
"processingTime": 3245
}src/
├── controllers/
│ └── rcController.ts # Main controller handling RC extraction
├── services/
│ ├── ocrService.ts # Tesseract OCR integration
│ └── llmService.ts # Ollama LLM integration
├── types/
│ └── index.ts # TypeScript type definitions
├── utils/
│ └── imageProcessor.ts # Image processing utilities
├── uploads/ # Temporary upload directory
└── index.ts # Main application entry point
| Variable | Description | Default |
|---|---|---|
PORT |
Server port | 3000 |
NODE_ENV |
Environment mode | development |
OLLAMA_URL |
Ollama server URL | http://localhost:11434 |
MODEL_NAME |
LLM model name | llama3.1:8b |
ALLOWED_ORIGINS |
CORS allowed origins | Empty (allows all in dev) |
- CORS configuration with environment-based origins
- Security headers (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)
- File upload validation (image files only, size limits)
- Temporary file cleanup after processing
- Error message filtering in production
- Concurrent OCR processing for front and back images
- Automatic cleanup of uploaded files
- Processing time monitoring and reporting
- Confidence scoring for extraction quality
- Global error handler with environment-aware error messages
- Graceful shutdown on SIGTERM/SIGINT
- Comprehensive error logging
- Proper HTTP status codes