RC Data Extractor

A production-ready API service for extracting structured data from Registration Certificate (RC) images using OCR and LLM processing.

Features

Image Processing: Extract text from RC front and back images using Tesseract.js (pure JavaScript OCR)
LLM Integration: Process extracted text with Ollama LLM for structured data extraction
RESTful API: Clean, well-documented API endpoints
Production Ready: Error handling, logging, security headers, graceful shutdown
TypeScript: Fully typed codebase for better development experience

Prerequisites

Node.js (v18 or higher)
npm or yarn
Ollama running locally (for LLM processing)

Setting up Ollama

Install Ollama from https://ollama.ai
Pull the required model:

ollama pull llama3.1:8b

Installation

Clone the repository:

git clone https://gitlabsdev.abhibus.com/research/rc-text-extractor-llm.git
cd rc-data-extractor

Install dependencies:

npm install

Configure environment variables in .env (already included):

# Server Configuration
PORT=3000
NODE_ENV=development

# Ollama LLM Configuration
OLLAMA_URL=http://localhost:11434
MODEL_NAME=llama3.1:8b

# CORS Configuration
ALLOWED_ORIGINS=

Usage

Development

npm run dev

Production Build

npm run build
npm start

API Endpoints

Health Check

GET /health

Response:

{
  "status": "OK",
  "message": "RC Data Extractor API is running",
  "timestamp": "2025-08-21T10:30:00.000Z",
  "environment": "development"
}

Extract RC Data

POST /extract-rc

Request: Multipart form data with:

front: RC front image file
back: RC back image file

Response:

{
  "success": true,
  "data": {
    "registrationNumber": "MH01AB1234",
    "ownerName": "John Doe",
    "vehicleClass": "Motor Car",
    "fuelType": "Petrol",
    "engineNumber": "ENG123456",
    "chassisNumber": "CHS789012",
    "manufacturingDate": "2020-01-15",
    "registrationDate": "2020-02-01",
    "validUpto": "2035-01-31",
    "address": "123 Main Street, Mumbai, Maharashtra",
    "vehicleMake": "Maruti Suzuki",
    "vehicleModel": "Swift",
    "color": "White"
  },
  "extractionConfidence": 0.85,
  "processingTime": 3245
}

Project Structure

src/
├── controllers/
│   └── rcController.ts       # Main controller handling RC extraction
├── services/
│   ├── ocrService.ts        # Tesseract OCR integration
│   └── llmService.ts        # Ollama LLM integration
├── types/
│   └── index.ts             # TypeScript type definitions
├── utils/
│   └── imageProcessor.ts    # Image processing utilities
├── uploads/                 # Temporary upload directory
└── index.ts                 # Main application entry point

Configuration

Environment Variables

Variable	Description	Default
`PORT`	Server port	`3000`
`NODE_ENV`	Environment mode	`development`
`OLLAMA_URL`	Ollama server URL	`http://localhost:11434`
`MODEL_NAME`	LLM model name	`llama3.1:8b`
`ALLOWED_ORIGINS`	CORS allowed origins	Empty (allows all in dev)

Security Features

CORS configuration with environment-based origins
Security headers (X-Content-Type-Options, X-Frame-Options, X-XSS-Protection)
File upload validation (image files only, size limits)
Temporary file cleanup after processing
Error message filtering in production

Performance

Concurrent OCR processing for front and back images
Automatic cleanup of uploaded files
Processing time monitoring and reporting
Confidence scoring for extraction quality

Error Handling

Global error handler with environment-aware error messages
Graceful shutdown on SIGTERM/SIGINT
Comprehensive error logging
Proper HTTP status codes

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
API.md		API.md
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RC Data Extractor

Features

Prerequisites

Setting up Ollama

Installation

Usage

Development

Production Build

API Endpoints

Health Check

Extract RC Data

Project Structure

Configuration

Environment Variables

Security Features

Performance

Error Handling

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rreddyja/llama-rc-extractor

Folders and files

Latest commit

History

Repository files navigation

RC Data Extractor

Features

Prerequisites

Setting up Ollama

Installation

Usage

Development

Production Build

API Endpoints

Health Check

Extract RC Data

Project Structure

Configuration

Environment Variables

Security Features

Performance

Error Handling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages