Supernote OCR Enhancer

Processes Supernote .note files using Apple Vision Framework OCR to replace Supernote's built-in OCR (~27% word error rate) with high-quality Vision Framework OCR (~5% word error rate) and pixel-perfect bounding boxes for search highlighting.

Runs natively on macOS using launchd for scheduling (Docker also available as alternative).

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Your Mac (Apple Silicon)                    │
│                                                                      │
│  launchd (native macOS scheduler)                                   │
│  ├── com.supernote.ocr-api          (always-on OCR service)        │
│  ├── com.supernote.ocr-enhancer.hourly   (runs at :00 each hour)   │
│  └── com.supernote.ocr-enhancer.daily    (runs at 3:30am)          │
│                                                                      │
│  ┌────────────────────────┐      ┌───────────────────────────────┐  │
│  │  OCR Enhancer          │      │  OCR API                      │  │
│  │  (Python + launchd)    │─────▶│  Apple Vision Framework       │  │
│  │                        │      │  localhost:8100               │  │
│  │  - Extracts pages      │      │                               │  │
│  │  - Tracks state (SQLite)│     │  - Native macOS OCR           │  │
│  │  - Injects OCR back    │      │  - Word-level bboxes          │  │
│  └──────────┬─────────────┘      └───────────────────────────────┘  │
│             │                                                        │
│             ▼                                                        │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │  Your Supernote data directory                                │   │
│  │  (.note files - synced from Supernote devices)                │   │
│  └──────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Features

High-quality OCR: Replaces Supernote's built-in OCR with Apple Vision Framework (+41.8% more text captured)
Fast processing: 0.8 seconds per page average (150x faster than Qwen2.5-VL)
Word-level bounding boxes: Each word gets its own precise bounding box using Vision's boundingBoxForRange API, matching the device's native OCR format for perfect search highlighting
Line break preservation: Detects line structure from Y-coordinates, maintains paragraph formatting
Smart tracking: SQLite database tracks file hashes to avoid reprocessing unchanged files
Backup protection: Creates timestamped backups before modifying any file
Live sync server support: Updates sync database while server runs (no restart needed)
Proper coordinate system: Uses device's native coordinate system (PNG pixels ÷ 11.9) for perfect highlighting
Search-enabled: Injected OCR data is searchable on device (works with any TYPE setting)

Performance

Production test results (100+ files, 300+ pages):

Processing time: ~4 minutes total
Speed: ~0.8 seconds per page average
Success rate: 96%+
Accuracy: +40% more text captured vs Supernote device OCR
vs Qwen2.5-VL 7B: 150x faster (optimal trade-off for batch processing)

Prerequisites

Apple Silicon Mac (M1/M2/M3/M4) - Required for Apple Vision Framework
macOS 13+ (Ventura or later) - Required for Vision Framework OCR
Python 3.11+ (comes with macOS or install via Homebrew)
Supernote .note files synced to your Mac

Note: Docker Desktop is optional. Native launchd is recommended (simpler, lower overhead).

Quick Start (Native launchd)

1. Clone and Set Up OCR API

git clone https://github.com/liketheduck/supernote-ocr-enhancer.git
cd supernote-ocr-enhancer

# Set up OCR API (see OCR API Setup section for details)
mkdir -p ~/services/ocr-api
cp examples/server.py ~/services/ocr-api/
./scripts/install-launchd.sh  # Installs OCR API as always-on service

2. Install the OCR Enhancer

./scripts/install-ocr-enhancer-launchd.sh

This will:

Create a Python virtual environment
Install dependencies
Create configuration at ~/.supernote-ocr/.env
Install hourly and daily scheduled jobs

3. Configure Your Settings

Edit ~/.supernote-ocr/.env:

# REQUIRED: Path to your Supernote .note files
SUPERNOTE_DATA_PATH=/path/to/your/supernote/data

Optional - If using a self-hosted Supernote Cloud sync server:

STORAGE_MODE=personal_cloud
MYSQL_PASSWORD=your_mysql_password  # Get with: docker exec supernote-mariadb env | grep MYSQL_PASSWORD

4. Test It

# Run once immediately
./scripts/install-ocr-enhancer-launchd.sh --run

# Check status
./scripts/install-ocr-enhancer-launchd.sh --check

# View logs
tail -f data/cron-ocr.log

Scheduled Jobs

Once installed, OCR runs automatically:

Hourly at :00 - Processes new/changed files (skips recently uploaded)
Daily at 3:30 AM - Full run, processes ALL files

Managing the Installation

# Check status
./scripts/install-ocr-enhancer-launchd.sh --check

# Run now
./scripts/install-ocr-enhancer-launchd.sh --run

# Uninstall
./scripts/install-ocr-enhancer-launchd.sh --remove

Quick Start (Docker Alternative)

If you prefer Docker, it's still fully supported:

1. Configure

cp .env.example .env.local
nano .env.local  # Set SUPERNOTE_DATA_PATH

2. Build and Run

docker compose build
docker compose run --rm ocr-enhancer python /app/main.py

3. For Scheduled Runs

docker compose up -d  # Starts container with cron daemon

Note: The sync server does NOT need to be stopped. MariaDB handles concurrent access safely. See Architecture: Why No Server Restart? for details.

OCR API Setup

The OCR API is a separate service that runs natively on macOS (not in Docker) to access Apple's Vision Framework. You must set this up before running the enhancer.

Using uv (recommended)

# Create OCR API directory
mkdir -p ~/services/ocr-api
cd ~/services/ocr-api

# Initialize with uv
uv init --name ocr-api --python 3.11

# Install dependencies
# CRITICAL: ocrmac is required for Apple Vision Framework OCR
uv add ocrmac pillow fastapi uvicorn python-multipart

# Optional: Add mlx-vlm for Qwen2.5-VL OCR (slower but more accurate)
# uv add mlx-vlm

# Copy server.py from this repo
cp /path/to/supernote-ocr-enhancer/examples/server.py .

# Create logs directory
mkdir -p logs

# Start the server
uv run python server.py

What Gets Installed

Package	Purpose	Required?
`ocrmac`	Apple Vision Framework OCR with word-level bounding boxes	Yes
`pillow`	Image processing	Yes
`fastapi`	REST API server	Yes
`uvicorn`	ASGI server	Yes
`python-multipart`	File upload support	Yes
`mlx-vlm`	Qwen2.5-VL models (optional, for `/ocr` endpoint)	No

Verify Installation

# Check if server is running
curl http://localhost:8100/health

# Check available endpoints
curl http://localhost:8100/prompts

The /prompts endpoint will show "vision_available": true if ocrmac is properly installed.

OCR Endpoints

Endpoint	OCR Engine	Speed	Accuracy	Use Case
`/ocr/vision`	Apple Vision Framework	0.8s/page	Good (+41.8% vs Supernote)	Default - batch processing, word-level bboxes
`/ocr`	Qwen2.5-VL 7B (requires mlx-vlm)	60-120s/page	Best (+107% vs Supernote)	Single files needing max accuracy

This project uses /ocr/vision by default for its speed advantage.

Troubleshooting ocrmac

If Vision OCR isn't working:

# Verify ocrmac is installed
uv run python -c "from ocrmac.ocrmac import OCR; print('ocrmac OK')"

# If you get import errors, try reinstalling
uv remove ocrmac && uv add ocrmac

ocrmac requires macOS 10.15+ and works best on macOS 13+ (Ventura).

Keeping the OCR API Running

The OCR API must be running for the enhancer to work. Choose one of these options:

Option 1: Always-On (Recommended)

Install as a macOS LaunchAgent that starts automatically on login and restarts if it crashes:

./scripts/install-launchd.sh

This is the recommended approach because:

Zero maintenance: Starts automatically, restarts on crash
Always ready: Hourly cron jobs will always find the API available
Low overhead: ~50-100MB RAM when idle, 0% CPU

To remove:

./scripts/install-launchd.sh --remove

Option 2: Manual Start

Start the OCR API manually when you need it:

./scripts/start-ocr-api.sh

This runs in the foreground (Ctrl+C to stop). Use this if you prefer manual control or want to save the ~50MB RAM when not processing.

Checking Status

# Check if OCR API is running
./scripts/start-ocr-api.sh --check

# Or directly
curl http://localhost:8100/health

Configuration

Variable	Default	Description
`SUPERNOTE_DATA_PATH`	(required)	Path to .note files on host
`OCR_API_URL`	`http://localhost:8100`	OCR API endpoint (Docker uses host.docker.internal)
`PROCESS_INTERVAL`	`0`	Seconds between runs (0 = single run)
`LOG_LEVEL`	`INFO`	Logging level
`WRITE_TO_NOTE`	`true`	Write OCR data back to files
`CREATE_BACKUPS`	`true`	Create backups before modifying
`RESET_DATABASE`	`false`	Clear all history and reprocess every file
`FILE_RECOGN_TYPE`	`keep`	`keep`=preserve existing setting, `0`=no device OCR, `1`=device OCR on (OCR always injected)
`OCR_PDF_LAYERS`	`true`	Extract and OCR embedded images from PDF/custom background layers
`OCR_TXT_EXPORT_ENABLED`	`false`	Export OCR text to local .txt files
`OCR_TXT_EXPORT_PATH`	(none)	Directory for exported .txt files (required if export enabled)

How It Works

Scan: Finds all .note files in the data directory
Track: Uses SQLite to track file/page hashes and avoid reprocessing
Extract: Converts each page to PNG (1920x2560) using supernotelib
OCR: Sends full-resolution images to Apple Vision Framework via OCR API for word-level text recognition with pixel-accurate bounding boxes
Transform: Converts Vision Framework coordinates (PNG pixels) to Supernote's coordinate system (PNG pixels ÷ 11.9)
Inject: Writes enhanced OCR data into the .note file's RECOGNTEXT block with proper coordinate format
Configure device: Sets FILE_RECOGN_TYPE (default 0) and FILE_RECOGN_LANGUAGE=en_US for search compatibility

PDF Layer OCR

When OCR_PDF_LAYERS=true (default), the enhancer can OCR pages with custom background layers such as:

PDF documents imported directly on the Supernote device
Documents created by external programs with embedded PNG backgrounds
Any page using a user_* style with PNG data in the BGLAYER

Platform limitation: When Supernote imports a PDF, it renders pages as images in the background layer but does not extract or OCR the text. This means imported PDFs are not searchable on the device—a significant limitation for reference documents, manuals, recipes, etc.

How this fixes it: If supernotelib's standard converter fails (e.g., UnknownDecodeProtocol), the enhancer checks if the BGLAYER contains a raw PNG image. If so, it extracts that PNG directly, sends it to the OCR API, and injects the recognized text back into the file. The PDF is now fully searchable on your Supernote device.

Existing OCR preserved: If a PDF layer file already has OCR data (e.g., from external tools or a previous run), the enhancer will skip it rather than re-OCR. This preserves any existing OCR work and avoids unnecessary processing.

Automatic recovery: If pages lose their OCR data (due to sync conflicts or injection failures), the enhancer automatically detects and re-processes only the affected pages. Pages with empty OCR results (no text found) are recognized as complete and won't be repeatedly re-OCR'd.

Warning (untested): If your PDF imported on the Supernote device preserves clickable links, enabling OCR_PDF_LAYERS may cause those links to be lost when the file is reconstructed. Set OCR_PDF_LAYERS=false if link preservation is critical. Note: this only affects PDFs that have NO existing OCR data—files with OCR are skipped entirely. Feedback welcome.

Text Export to Local Files

You can optionally save the recognized text to local .txt files, preserving your folder structure. This is useful for:

Local full-text search across all your notes
Backup of recognized text
Integration with other tools (grep, Obsidian, etc.)
Accessing note content without the Supernote device

To enable text export, add these to your .env.local:

OCR_TXT_EXPORT_ENABLED=true
OCR_TXT_EXPORT_PATH=~/Documents/SupernoteText

How it works:

The folder structure of your notes is preserved
Files have the same name but with .txt extension
Multi-page notes include page separators (e.g., --- Page 1 ---)
Text is exported alongside OCR injection (doesn't affect the normal flow)

Example:

Your notes:
  /Volumes/Data/Supernote/user/Note/Work/Meeting.note
  /Volumes/Data/Supernote/user/Note/Personal/Ideas.note

With SUPERNOTE_DATA_PATH=/Volumes/Data/Supernote
And  OCR_TXT_EXPORT_PATH=~/Documents/SupernoteText

Exported text files:
  ~/Documents/SupernoteText/user/Note/Work/Meeting.txt
  ~/Documents/SupernoteText/user/Note/Personal/Ideas.txt

FILE_RECOGN_TYPE: What It Actually Controls

FILE_RECOGN_TYPE controls realtime recognition during writing, NOT search capability:

Setting	Device OCRs While Writing?	Search Works?	Description
`0`	❌ No	✅ Yes (if OCR data exists)	Default - Preserves our Vision OCR
`1`	✅ Yes (realtime OCR)	✅ Yes	Device OCRs new strokes as you write
`keep`	(unchanged)	✅ Yes	Injects OCR but preserves file's existing TYPE setting

Key insight: Files with TYPE='0' are still fully searchable if they have RECOGNTEXT data. The TYPE setting only controls whether the device does realtime OCR while you're writing - it doesn't affect search.

Why we use TYPE='0' (default):

Prevents device from overwriting our high-quality Vision OCR
Search still works perfectly (RECOGNTEXT data is preserved)
Reduces unnecessary processing on the device

The workflow:

You write on device → no realtime OCR (TYPE='0')
File syncs to server → our enhancer applies Vision OCR
File syncs back to device → device uses our OCR for search
New edits sync → we OCR them on next hourly run

To enable device OCR (if you want realtime recognition while writing):

# In .env.local
FILE_RECOGN_TYPE=1

Testing notes:

LANG='none' causes "redownload language" prompt (never use)
RECOGNSTATUS=1 (done) doesn't prevent device re-OCR on edits - TYPE controls realtime behavior

Preserved Metadata

The OCR injection process preserves important .note file metadata:

Field	Location	Purpose
`FINALOPERATION_PAGE`	Header	Last viewed page (device resumes here)
`FINALOPERATION_LAYER`	Header	Last active layer
`DIRTY`	Footer	File state tracking (affects page resume behavior)

The standard supernotelib reconstruction loses footer fields like DIRTY. This enhancer uses a custom footer packer to preserve these fields, ensuring the device opens files on the correct page after OCR injection.

Architecture: Why No Server Restart?

Previous versions stopped the sync server during OCR processing. This is no longer necessary because:

1. MariaDB handles concurrent access safely

Row-level locking prevents simultaneous writes to the same record
ACID transactions ensure data consistency
Our UPDATE statements are single-row atomic operations

2. The sync protocol is stateless

Each device sync is a fresh request-response cycle
No long-running transactions span multiple requests
Updated database values are seen immediately on next sync

3. File-level sync uses terminal_file_edit_time

We update size, md5, and bump terminal_file_edit_time by +1 second
Sync protocol: higher terminal_file_edit_time wins (determines upload vs download)
If user hasn't edited: our +1s bump > device's timestamp → device downloads our OCR
If user HAS edited: their new timestamp >> our +1s bump → device uploads (user wins)
This prevents conflicts: there's always a clear winner based on timestamp comparison

4. Graceful no-op for unchanged files

SQLite tracks file hashes locally
Files that haven't changed are skipped in milliseconds
Running hourly adds negligible overhead

5. Age threshold prevents mid-sync processing

Files modified less than 60 seconds ago are skipped
Ensures sync completes before OCR runs

This architecture allows hourly OCR runs without service interruption or database corruption risk.

Word-Level Bounding Boxes

The Problem: Supernote's device OCR returns individual words with separate bounding boxes. Our original implementation returned entire lines as single "words", which prevented proper search highlighting.

The Solution: We use Vision Framework's boundingBoxForRange API to extract precise bounding boxes for each word within a recognized line. This matches the device's native OCR format exactly.

How it works:

Vision recognizes text line-by-line (e.g., "Hello World")
We split each line into words using regex (\S+)
For each word, we call boundingBoxForRange to get its exact pixel coordinates
Coordinates are converted from Vision's bottom-left origin to Supernote's top-left origin

Fallback mechanism: If boundingBoxForRange fails for any word (returns None or throws an exception), we fall back to proportional estimation - distributing words across the line based on character count. This is less precise but ensures OCR data is always generated.

Performance: Word-level extraction adds ~10-15ms per page (negligible).

Output format (matches device OCR):

{
  "elements": [{
    "label": "Hello World",
    "words": [
      {"label": "Hello", "bounding-box": {"x": 10.5, "y": 5.2, "width": 15.3, "height": 8.1}},
      {"label": " "},
      {"label": "World", "bounding-box": {"x": 27.1, "y": 5.2, "width": 18.7, "height": 8.1}}
    ]
  }]
}

Critical: Supernote Coordinate System Discovery

The Problem: Search highlighting wasn't working - highlights appeared in wrong positions or not at all.

The Solution: By analyzing a device-generated OCR file, we discovered Supernote uses a scaled coordinate system:

Vision Framework returns: Bounding boxes in PNG pixel coordinates (e.g., x=420, y=711)
Supernote expects: Coordinates in a scaled system = PNG pixels ÷ 11.9 (e.g., x=35.34, y=59.72)

Why 11.9? Empirically determined by comparing device OCR coordinates to Vision Framework coordinates for the same text. The ratio is consistently ~11.9x.

Example transformation:

# Vision Framework output (pixels)
bbox = [420.47, 710.70, 963.73, 900.47]  # [left, top, right, bottom]

# Convert to Supernote format
x = 420.47 / 11.9 = 35.33
y = 710.70 / 11.9 = 59.72
width = (963.73 - 420.47) / 11.9 = 45.65
height = (900.47 - 710.70) / 11.9 = 15.95

# Supernote format
{"bounding-box": {"x": 35.33, "y": 59.72, "width": 45.65, "height": 15.95}, "label": "word"}

This transformation is critical for search highlighting to work correctly on the device.

Storage Mode Options

This tool supports three ways to access your Supernote .note files:

Mode	Description	When to Use
Personal Cloud	Self-hosted Supernote Cloud sync server	Power users with docker-based sync
Mac App	Official Supernote Mac application	Users of the desktop Mac app
Manual	Direct file access (USB, file manager)	Simple setups without sync

Quick Decision Guide

Using the official Supernote Mac app? → Use Mac App Mode
Using a self-hosted sync server? → Use Personal Cloud Mode (default)
Manually copying files via USB? → Use Manual Mode

Supernote Mac App

If you use the official Supernote Partner Mac app to sync your notes, use this mode.

How It Works

The Mac app stores your .note files and sync state locally:

~/Library/Containers/com.ratta.supernote/Data/Library/Application Support/
com.ratta.supernote/<USER_ID>/
├── supernote.db          # SQLite sync database
├── Supernote/            # Your .note files
│   ├── Note/
│   ├── Document/
│   └── ...

Sync mechanism: When we modify a .note file, we update the Mac app's SQLite database to set local_s_h_a (local file hash) to the new hash while keeping server_s_h_a (server hash) unchanged. This signals to the app: "local file changed, server has old version → UPLOAD needed". The app then pushes the OCR-enhanced file to Supernote's cloud instead of downloading the old version.

Critical: Quit the App During Processing

You MUST quit Supernote Partner before running OCR enhancement. If the app is running:

It may sync mid-processing and download old files from the server
It holds locks on the SQLite database
File changes may not be detected correctly

The run-with-macapp.sh script will prompt you to quit the app. For automated runs, use the cron template which handles this automatically.

Quick Start (Mac App)

Option 1: Auto-Detection (Recommended)

# Auto-detects your Mac app paths - no configuration needed!
./run-with-macapp.sh --auto

Option 2: Manual Configuration

# 1. Find your user ID (long numeric string)
ls ~/Library/Containers/com.ratta.supernote/Data/Library/Application\ Support/com.ratta.supernote/

# 2. Configure .env.local
cat >> .env.local << 'EOF'
STORAGE_MODE=mac_app
MACAPP_NOTES_PATH=~/Library/Containers/com.ratta.supernote/Data/Library/Application Support/com.ratta.supernote/YOUR_USER_ID/Supernote
MACAPP_DATABASE_PATH=~/Library/Containers/com.ratta.supernote/Data/Library/Application Support/com.ratta.supernote/YOUR_USER_ID/supernote.db
EOF

# 3. Run OCR enhancement (quit the app first!)
./run-with-macapp.sh

Mac App Script Options

./run-with-macapp.sh              # Normal run (prompts to quit app)
./run-with-macapp.sh --auto       # Auto-detect paths (no config needed)
./run-with-macapp.sh --dry-run    # Preview what would happen

Scheduling Mac App OCR (Cron Job)

For automatic nightly OCR processing, use the provided cron template. This template automatically quits and restarts Supernote Partner to prevent sync conflicts.

# 1. Copy the template
cp scripts/cron-macapp-template.sh ~/scripts/supernote-ocr-cron.sh

# 2. Edit and set your OCR_ENHANCER_DIR path
nano ~/scripts/supernote-ocr-cron.sh

# 3. Make executable
chmod +x ~/scripts/supernote-ocr-cron.sh

# 4. Add to crontab (runs daily at midnight)
crontab -e

Add this line to your crontab:

0 0 * * * /Users/YOUR_USERNAME/scripts/supernote-ocr-cron.sh >> /tmp/supernote-ocr.log 2>&1

What the cron job does:

Quits Supernote Partner (gracefully, then force-kill if needed)
Waits for the app to fully close
Runs OCR enhancement on all .note files
Updates the database to trigger upload
Restarts Supernote Partner
App syncs enhanced files to Supernote cloud

Important: Mac App cron runs on your Mac (host), not inside Docker. This is completely separate from Personal Cloud cron which runs inside the Docker container.

Notes for Mac App Users

App name: The Mac app is called "Supernote Partner" (not just "Supernote").
Sync behavior: After OCR enhancement, when you open Supernote Partner, it will upload your enhanced files to the cloud (not download old versions).
File tracking: Files are tracked by path and content hash. Files are only re-processed if their content changes.
No Docker orchestration: Unlike Personal Cloud mode, Mac App mode doesn't stop/start Docker services. It only updates the local SQLite database.

Supernote Cloud / Sync Server

If you use a self-hosted Supernote Cloud sync server (like Supernote-Private-Cloud), this mode works seamlessly. The OCR enhancer updates the sync database while the server runs - no restart needed.

Do I need special configuration?

If you manually transfer files (USB, file manager): No sync server needed. Just point SUPERNOTE_DATA_PATH to your .note files.
If you use the Mac app: See Supernote Mac App section above.
If you use a self-hosted sync server: Just run the OCR enhancer - it updates the database automatically.

How It Works

When OCR modifies a .note file, the enhancer updates the sync server's MariaDB database:

Sets new size and md5 hash
Bumps terminal_file_edit_time by +1 second (so server version is "newer")
Updates update_time to current time

This happens atomically via Docker socket access to the MariaDB container. The bumped timestamp makes the server's version win the sync (device downloads), unless the user has edited on the device (their timestamp would be much later, so device uploads).

Configuration

Configure in .env.local:

# Enable Personal Cloud sync mode
STORAGE_MODE=personal_cloud

# MySQL password from your sync server's MariaDB container
# Find it with: docker exec supernote-mariadb env | grep MYSQL_PASSWORD
MYSQL_PASSWORD=your_mysql_password_here

Scheduling Personal Cloud OCR (Container Cron)

The container runs cron jobs automatically:

Schedule	Behavior
Every hour (:00)	Skips files uploaded in last 8 hours (conflict prevention)
3:30 AM	Full run - processes ALL files regardless of upload time

The 3:30am run is low-risk for conflicts since you're likely asleep and not editing.

Additional safeguards:

Age threshold: Files modified <60 seconds ago are skipped (prevents processing mid-sync)
Hash comparison: Already-processed files are skipped in milliseconds
Atomic updates: Database updates are safe while sync server runs

To use container-based cron:

# Start the container (runs cron daemon)
docker compose up -d

# View logs
docker compose logs -f ocr-enhancer

The cron schedule is in config/crontab (default: every hour at :00).

Legacy: Manual Sync Control

The run-with-sync-control.sh script is still available for manual runs with explicit sync server control:

./run-with-sync-control.sh           # Stops server, runs OCR, restarts server
./run-with-sync-control.sh --dry-run # Preview what would happen

This is no longer required but may be preferred for initial testing.

Manual File Transfer

If you manually transfer files via USB or a file manager (no sync server or Mac app), use this simple mode.

Quick Start (Manual)

# 1. Configure your data path in .env.local
echo "SUPERNOTE_DATA_PATH=/path/to/your/supernote/files" >> .env.local

# 2. Run OCR enhancement
docker compose run --rm ocr-enhancer python /app/main.py

No database synchronization is needed since there's no sync server to coordinate with.

Processing State & File Tracking

The SQLite database (./data/processing.db) tracks:

note_files: File path, hash, modification time, processing status
page_results: Per-page hash, OCR text, processing time

When Files Are Reprocessed

Files are reprocessed when:

File hash changes (you added new content)
Previous processing failed
File is new (never processed before)

Files are skipped when:

Already successfully processed with same content hash
This prevents wasting time re-OCRing unchanged files

Device Re-OCR Behavior

Background: Supernote devices have "Real-time Recognition" controlled by FILE_RECOGN_TYPE in the .note file header. This setting controls whether the device performs OCR while you write.

Our approach: We set FILE_RECOGN_TYPE=0 by default to preserve our high-quality Vision OCR:

✅ Search and highlighting work for all OCR text
✅ Device won't overwrite our OCR with lower-quality realtime OCR
✅ New edits are OCR'd by our enhancer on the next hourly sync

Alternative: Set FILE_RECOGN_TYPE=1 in .env.local if you want the device to do realtime OCR as you write. This gives immediate (but lower quality) searchability for new strokes, which our enhancer will improve on next sync.

Performance

Apple Vision Framework (Default)

Speed: ~0.8 seconds per page average
Memory: ~200MB (minimal footprint)
Accuracy: +41.8% more text vs Supernote device OCR

Qwen2.5-VL 7B (Optional, requires mlx-vlm)

First page: ~60-120 seconds (MLX kernel compilation)
Subsequent pages: ~20-100 seconds depending on content
GPU: Metal (Apple Silicon) - CPU will appear idle during processing
Memory: ~8GB for 7B model
Accuracy: Higher accuracy than Vision Framework, but much slower

Troubleshooting

OCR API not available

# Check if running
curl http://localhost:8100/health

# Check logs if using the provided scripts
tail -f ~/services/ocr-api/logs/server.log

Files keep reprocessing

The file hash is recomputed after OCR injection. If you're seeing files reprocessed, check that the hash update is working:

sqlite3 ./data/processing.db "SELECT file_path, file_hash, processing_status FROM note_files;"

Bounding boxes in wrong location

Vision Framework OCR uses full-resolution images (1920x2560) and returns pixel coordinates that are then divided by 11.9 for Supernote's coordinate system. If highlighting is misaligned, verify the coordinate transformation in note_processor.py.

Sync conflicts

Why conflicts occur:

The Supernote sync protocol creates a CONFLICT when both sides have changes:

1. Device uploads file (md5=A)
2. We run OCR → server now has md5=B
3. User edits on device → device now has md5=C
4. Device syncs → Server sees: device changed (A→C) AND server changed (A→B)
5. Both sides changed → CONFLICT (to protect user's work)

Why timestamps alone can't fix this:

We bump terminal_file_edit_time by +1 second so server wins when only server changed. But if the user edited on device, their timestamp is hours/days later than our +1 second bump. The sync protocol sees both sides have changes and creates a conflict to prevent data loss.

We could force server to always win (set timestamp to year 2099), but that would overwrite user's handwriting - unacceptable.

Our solution - skip actively-edited files:

Skip recently uploaded files: Files uploaded in the last 8 hours are skipped during hourly runs
3:30am full run: Processes ALL files regardless of upload time (low conflict risk)
Then OCR: Device's local file is "clean" (no pending edits), only server changed
Device downloads: No conflict because only one side changed

If you still see conflicts:

You edited the file on device after OCR ran but before syncing
The 3:30am run should catch most files safely
The conflict file contains our OCR version - you can delete it or keep for reference

Configuration checks:

Verify STORAGE_MODE=personal_cloud is set in .env.local
Verify MYSQL_PASSWORD matches your MariaDB container's password
Check MariaDB is accessible: docker exec supernote-mariadb mysqladmin ping

Project Structure

supernote-ocr-enhancer/
├── .env.example              # Template for local configuration
├── .env.local                # Your local config (git-ignored)
├── Dockerfile                # Container definition (Docker alternative)
├── docker-compose.yml        # Service configuration (Docker alternative)
├── run-with-sync-control.sh  # Personal Cloud sync coordination
├── run-with-macapp.sh        # Mac app mode (auto-detects paths)
├── app/
│   ├── main.py               # Entry point and processing loop
│   ├── database.py           # SQLite state tracking
│   ├── ocr_client.py         # OCR API client
│   ├── note_processor.py     # .note file handling
│   └── sync_handlers.py      # Sync database handlers (Mac app & Personal Cloud)
├── config/
│   ├── crontab               # Cron schedule (Docker only)
│   ├── .env.launchd.example  # Template for native launchd config
│   ├── com.supernote.ocr-api.plist.template         # OCR API LaunchAgent
│   ├── com.supernote.ocr-enhancer.hourly.plist.template  # Hourly job
│   └── com.supernote.ocr-enhancer.daily.plist.template   # Daily job
├── examples/
│   └── server.py             # OCR API server (copy to ~/services/ocr-api/)
├── scripts/
│   ├── install-ocr-enhancer-launchd.sh  # Install enhancer as launchd jobs
│   ├── run-ocr-native.sh         # Native runner script for launchd
│   ├── install-launchd.sh        # Install OCR API as LaunchAgent
│   ├── start-ocr-api.sh          # Start OCR API manually (foreground)
│   ├── cron-ocr-job.sh           # Cron job script (Docker only)
│   ├── cron-macapp-template.sh   # Template for Mac app scheduled OCR
│   ├── compare_ocr.py            # OCR comparison tool
│   └── extract_ocr_text.py       # OCR backup/export
└── data/
    ├── processing.db         # State database (git-ignored)
    └── backups/              # File backups (git-ignored)

License

Apache License 2.0 - See LICENSE file.

This means you can use, modify, and distribute this software, but you must:

Include the original copyright notice
Provide attribution in derivative works
State any changes you made

Acknowledgments

supernotelib - Supernote .note file parsing
ocrmac - Python wrapper for Apple Vision Framework OCR
MLX-VLM - Apple Silicon optimized vision-language models (optional)
Qwen2.5-VL - Alternative OCR model (optional)

Contributing

Contributions welcome! Please open an issue or PR.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
app		app
config		config
data		data
examples		examples
scripts		scripts
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
run-with-macapp.sh		run-with-macapp.sh
run-with-sync-control.sh		run-with-sync-control.sh

License

liketheduck/supernote-ocr-enhancer

Folders and files

Latest commit

History

Repository files navigation

Supernote OCR Enhancer

Architecture

Features

Performance

Prerequisites

Quick Start (Native launchd)

1. Clone and Set Up OCR API

2. Install the OCR Enhancer

3. Configure Your Settings

4. Test It

Scheduled Jobs

Managing the Installation

Quick Start (Docker Alternative)

1. Configure

2. Build and Run

3. For Scheduled Runs

OCR API Setup

Using uv (recommended)

What Gets Installed

Verify Installation

OCR Endpoints

Troubleshooting ocrmac

Keeping the OCR API Running

Option 1: Always-On (Recommended)

Option 2: Manual Start

Checking Status

Configuration

How It Works

PDF Layer OCR

Text Export to Local Files

FILE_RECOGN_TYPE: What It Actually Controls

Preserved Metadata

Architecture: Why No Server Restart?

Word-Level Bounding Boxes

Critical: Supernote Coordinate System Discovery

Storage Mode Options

Quick Decision Guide

Supernote Mac App

How It Works

Critical: Quit the App During Processing

Quick Start (Mac App)

Mac App Script Options

Scheduling Mac App OCR (Cron Job)

Notes for Mac App Users

Supernote Cloud / Sync Server

Do I need special configuration?

How It Works

Configuration

Scheduling Personal Cloud OCR (Container Cron)

Legacy: Manual Sync Control

Manual File Transfer

Quick Start (Manual)

Processing State & File Tracking

When Files Are Reprocessed

Device Re-OCR Behavior

Performance

Apple Vision Framework (Default)

Qwen2.5-VL 7B (Optional, requires mlx-vlm)

Troubleshooting

OCR API not available

Files keep reprocessing

Bounding boxes in wrong location

Sync conflicts

Project Structure

License

Acknowledgments

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages