OCR-harness

A flexible OCR demonstration application featuring swappable LLM models for document text extraction and processing. Currently configured with LightOnOCR-1B-1025 from HuggingFace.

Features

Model Flexibility: Swap OCR models through YAML configuration without code changes
Desktop-First Design: Optimized for desktop/laptop use (blocks mobile devices)
File Support: PDF, PNG, JPG, JPEG, WebP, GIF (up to 50MB)
Drag-and-Drop: Easy file upload interface
Multiple Export Formats: TXT, CSV, XLSX
Comprehensive Logging: Built-in error logging and diagnostics
One-Command Startup: Launch all services with npm start

Technology Stack

Frontend

Next.js 14+ with TypeScript
Tailwind CSS for styling
React Dropzone for file uploads
Lucide React for icons
XLSX.js for Excel export

Backend

FastAPI (Python)
vLLM for model serving
pypdfium2 for PDF processing
Pillow for image handling

Model

LightOnOCR-1B-1025 from HuggingFace
Configurable through backend/config/models.yaml

Requirements

Node.js 18+
Python 3.11+
RAM: 8GB minimum, 16GB recommended
GPU: Recommended for faster processing (optional)

Quick Start

1. Clone the Repository

git clone <repository-url>
cd OCR-harness

2. Install Dependencies

# Install root dependencies
npm install

# Install frontend dependencies
cd frontend
npm install
cd ..

# Install backend dependencies
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..

3. Configure Environment

# Backend environment
cp backend/.env.example backend/.env

4. Start the Application

# Start all services with one command
npm run start

Or start services individually:

# Terminal 1: Start vLLM server (takes several minutes first time)
npm run start:vllm

# Terminal 2: Start backend (after vLLM is ready)
npm run start:backend

# Terminal 3: Start frontend
npm run start:frontend

5. Access the Application

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
vLLM Server: http://localhost:8001

Usage

Upload a Document: Drag and drop or click to browse for a PDF or image file
Process: Click "Process Document" to extract text using OCR
Export: Choose TXT, CSV, or XLSX format to download results
View Logs: Click "View Logs" for debugging and diagnostics

Configuration

Model Configuration

Edit backend/config/models.yaml to change OCR models:

models:
  default: "lighton-ocr-1b"

  configurations:
    lighton-ocr-1b:
      name: "LightOnOCR-1B-1025"
      display_name: "LightOn OCR 1B (1025)"
      model_path: "lightonai/LightOnOCR-1B-1025"
      server_port: 8001
      parameters:
        temperature: 0.2
        top_p: 0.9
        max_tokens: 6500
        render_dpi: 300

Environment Variables

Frontend (.env.local):

NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_MAX_FILE_SIZE=52428800

Backend (.env):

VLLM_SERVER_URL=http://localhost:8001
MODEL_CONFIG_PATH=./config/models.yaml
CORS_ORIGINS=http://localhost:3000

Project Structure

OCR-harness/
├── frontend/              # Next.js frontend
│   ├── app/              # Next.js app directory
│   ├── components/       # React components
│   ├── lib/             # Utilities, API client, logger
│   └── package.json
├── backend/              # FastAPI backend
│   ├── app/             # Application code
│   │   ├── routes/      # API endpoints
│   │   ├── services/    # Business logic
│   │   ├── models/      # Data models
│   │   └── utils/       # Utilities
│   ├── config/          # Configuration files
│   │   └── models.yaml  # Model configurations
│   ├── requirements.txt
│   └── start_vllm.py   # vLLM startup script
├── scripts/
│   └── start.js        # Unified startup script
├── docs/               # Project documentation
└── package.json        # Root package file

Development

Frontend Development

cd frontend
npm run dev

Backend Development

cd backend
source venv/bin/activate
uvicorn app.main:app --reload --port 8000

Skip vLLM During Development

npm run start -- --skip-vllm

This is useful when testing frontend/backend changes without needing the model server.

Troubleshooting

vLLM Server Won't Start

Check RAM: Ensure you have at least 8GB available
Install vLLM: pip install vllm
CUDA Issues: If you have a GPU, ensure CUDA is properly configured

Backend Errors

Check Python Version: Must be 3.11+
Reinstall Dependencies: pip install -r backend/requirements.txt
Check Ports: Ensure ports 8000 and 8001 are available

Frontend Build Errors

Check Node Version: Must be 18+
Clear Cache: rm -rf frontend/.next && cd frontend && npm run dev
Reinstall Modules: rm -rf frontend/node_modules && cd frontend && npm install

Model Download Issues

The LightOnOCR model will be downloaded automatically on first run. This can take several minutes depending on your internet connection. The model is cached locally for subsequent runs.

API Endpoints

Process Document

POST /api/process
Content-Type: multipart/form-data

Parameters:
- file: File (PDF or image)
- config: Optional JSON string with processing parameters

Returns:
{
  "success": true,
  "text": "extracted text...",
  "metadata": {
    "filename": "document.pdf",
    "pages": 3,
    "model_used": "LightOn OCR 1B (1025)"
  }
}

Get Models

GET /api/models

Returns:
{
  "current": { ... },
  "available": [ ... ]
}

Get Logs

GET /api/logs?level=ERROR&limit=100

Returns:
{
  "logs": [ ... ],
  "total": 42
}

See full API documentation at http://localhost:8000/docs when running.

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

Check the troubleshooting section above
Review logs via the "View Logs" button in the UI
Check the API documentation at http://localhost:8000/docs
Open an issue on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json

License

spenceriam/OCR-harness

Folders and files

Latest commit

History

Repository files navigation

OCR-harness

Features

Technology Stack

Frontend

Backend

Model

Requirements

Quick Start

1. Clone the Repository

2. Install Dependencies

3. Configure Environment

4. Start the Application

5. Access the Application

Usage

Configuration

Model Configuration

Environment Variables

Project Structure

Development

Frontend Development

Backend Development

Skip vLLM During Development

Troubleshooting

vLLM Server Won't Start

Backend Errors

Frontend Build Errors

Model Download Issues

API Endpoints

Process Document

Get Models

Get Logs

License

Contributing

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages