A flexible OCR demonstration application featuring swappable LLM models for document text extraction and processing. Currently configured with LightOnOCR-1B-1025 from HuggingFace.
- Model Flexibility: Swap OCR models through YAML configuration without code changes
- Desktop-First Design: Optimized for desktop/laptop use (blocks mobile devices)
- File Support: PDF, PNG, JPG, JPEG, WebP, GIF (up to 50MB)
- Drag-and-Drop: Easy file upload interface
- Multiple Export Formats: TXT, CSV, XLSX
- Comprehensive Logging: Built-in error logging and diagnostics
- One-Command Startup: Launch all services with
npm start
- Next.js 14+ with TypeScript
- Tailwind CSS for styling
- React Dropzone for file uploads
- Lucide React for icons
- XLSX.js for Excel export
- FastAPI (Python)
- vLLM for model serving
- pypdfium2 for PDF processing
- Pillow for image handling
- LightOnOCR-1B-1025 from HuggingFace
- Configurable through
backend/config/models.yaml
- Node.js 18+
- Python 3.11+
- RAM: 8GB minimum, 16GB recommended
- GPU: Recommended for faster processing (optional)
git clone <repository-url>
cd OCR-harness# Install root dependencies
npm install
# Install frontend dependencies
cd frontend
npm install
cd ..
# Install backend dependencies
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
cd ..# Backend environment
cp backend/.env.example backend/.env# Start all services with one command
npm run startOr start services individually:
# Terminal 1: Start vLLM server (takes several minutes first time)
npm run start:vllm
# Terminal 2: Start backend (after vLLM is ready)
npm run start:backend
# Terminal 3: Start frontend
npm run start:frontend- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- vLLM Server: http://localhost:8001
- Upload a Document: Drag and drop or click to browse for a PDF or image file
- Process: Click "Process Document" to extract text using OCR
- Export: Choose TXT, CSV, or XLSX format to download results
- View Logs: Click "View Logs" for debugging and diagnostics
Edit backend/config/models.yaml to change OCR models:
models:
default: "lighton-ocr-1b"
configurations:
lighton-ocr-1b:
name: "LightOnOCR-1B-1025"
display_name: "LightOn OCR 1B (1025)"
model_path: "lightonai/LightOnOCR-1B-1025"
server_port: 8001
parameters:
temperature: 0.2
top_p: 0.9
max_tokens: 6500
render_dpi: 300Frontend (.env.local):
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_MAX_FILE_SIZE=52428800Backend (.env):
VLLM_SERVER_URL=http://localhost:8001
MODEL_CONFIG_PATH=./config/models.yaml
CORS_ORIGINS=http://localhost:3000OCR-harness/
├── frontend/ # Next.js frontend
│ ├── app/ # Next.js app directory
│ ├── components/ # React components
│ ├── lib/ # Utilities, API client, logger
│ └── package.json
├── backend/ # FastAPI backend
│ ├── app/ # Application code
│ │ ├── routes/ # API endpoints
│ │ ├── services/ # Business logic
│ │ ├── models/ # Data models
│ │ └── utils/ # Utilities
│ ├── config/ # Configuration files
│ │ └── models.yaml # Model configurations
│ ├── requirements.txt
│ └── start_vllm.py # vLLM startup script
├── scripts/
│ └── start.js # Unified startup script
├── docs/ # Project documentation
└── package.json # Root package file
cd frontend
npm run devcd backend
source venv/bin/activate
uvicorn app.main:app --reload --port 8000npm run start -- --skip-vllmThis is useful when testing frontend/backend changes without needing the model server.
- Check RAM: Ensure you have at least 8GB available
- Install vLLM:
pip install vllm - CUDA Issues: If you have a GPU, ensure CUDA is properly configured
- Check Python Version: Must be 3.11+
- Reinstall Dependencies:
pip install -r backend/requirements.txt - Check Ports: Ensure ports 8000 and 8001 are available
- Check Node Version: Must be 18+
- Clear Cache:
rm -rf frontend/.next && cd frontend && npm run dev - Reinstall Modules:
rm -rf frontend/node_modules && cd frontend && npm install
The LightOnOCR model will be downloaded automatically on first run. This can take several minutes depending on your internet connection. The model is cached locally for subsequent runs.
POST /api/process
Content-Type: multipart/form-data
Parameters:
- file: File (PDF or image)
- config: Optional JSON string with processing parameters
Returns:
{
"success": true,
"text": "extracted text...",
"metadata": {
"filename": "document.pdf",
"pages": 3,
"model_used": "LightOn OCR 1B (1025)"
}
}
GET /api/models
Returns:
{
"current": { ... },
"available": [ ... ]
}
GET /api/logs?level=ERROR&limit=100
Returns:
{
"logs": [ ... ],
"total": 42
}
See full API documentation at http://localhost:8000/docs when running.
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- Check the troubleshooting section above
- Review logs via the "View Logs" button in the UI
- Check the API documentation at http://localhost:8000/docs
- Open an issue on GitHub