Skip to content

adambalm/ghost-writer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Ghost Writer v2.0 - Multi-Agent Handwritten Note Intelligence

Python 3.12+ Tests Multi-Agent Coverage

System Overview

Ghost Writer is an OCR and document processing system for handwritten notes. It transforms handwritten content into structured digital documents using OCR technology and document organization features.

Current Status

  • Unified OCR Pipeline: Qwen2.5-VL (local) + Tesseract + Google Vision + GPT-4 Vision with intelligent routing
  • Superior Handwriting Recognition: Qwen2.5-VL provides FREE local transcription with 2-5s response time
  • Document Processing: Relationship detection, concept clustering, structure generation
  • Test Coverage: 137 tests passing with 68% code coverage
  • Privacy & Cost Controls: Local-first processing with automatic budget management

Architecture

   Handwritten Notes β†’ Hybrid OCR β†’ Relationship Detection β†’ Concept Clustering β†’ Structure Generation
         ↓                ↓              ↓                    ↓                    ↓
   Input Processing   Smart Router    Semantic Analysis    Theme Organization   Document Formats
   Image/PDF         Cost-Optimized   Visual Patterns     Idea Clustering     (Outline/Timeline)

Dual Beachhead Strategy

  1. Privacy-Conscious Professionals: Premium accuracy transcription with local-first processing
  2. Idea Organization for Learning Differences: Transform scattered thoughts into coherent documents

🎯 Core Features

βœ… Premium OCR Processing

  • Hybrid Intelligence: Tesseract (local/free) + Google Vision (premium) + GPT-4 Vision (semantic) + Qwen2.5-VL (local vision LLM)
  • Local Vision Models: Qwen2.5-VL 7B via Ollama for superior handwriting transcription (2-4 second response time)
  • Smart Routing: Cost-aware provider selection with confidence thresholds
  • Budget Controls: Daily limits with automatic fallbacks ($5/day default)
  • Quality Modes: Fast, Balanced, Premium processing options

βœ… Idea Organization Engine

  • Relationship Detection: Visual arrows, spatial proximity, hierarchical patterns
  • Concept Clustering: Multi-strategy extraction (topics, actions, entities)
  • Structure Generation: Outlines, mind maps, timelines, process flows
  • Confidence Scoring: Quality metrics for all generated structures

βœ… Multi-Agent Coordination

  • Document-Based Handoffs: Agents communicate through structured artifacts
  • QA Agent: Testing and integration validation (Gemini 2.5 Pro)
  • Implementation Agent: Code development and optimization (Claude 4 Sonnet)
  • Supervisor Oversight: Task coordination and quality assurance

βœ… Privacy & Security

  • Local-First Processing: Tesseract + SQLite for sensitive content
  • Encrypted Storage: Secure local database with audit trails
  • Zero Data Leakage: Optional cloud processing with privacy controls

πŸš€ Quick Start

Prerequisites

  • Python 3.12+
  • Tesseract OCR
  • Ollama (for local Qwen2.5-VL vision model)
  • Optional: Google Cloud Vision API key
  • Optional: OpenAI API key

Installation

# Clone repository
git clone https://github.com/adambalm/ghost-writer.git
cd ghost-writer

# Setup environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install and setup Ollama with Qwen2.5-VL model
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5vl:7b

# Initialize Ghost Writer
python -m src.cli init

# Verify installation
python -m pytest tests/ --tb=short -q

Command Line Usage

Process a single file:

# Process an image file
python -m src.cli process my_notes.png

# Process a Supernote .note file
python -m src.cli process my_notebook.note --format all

# Process with premium quality
python -m src.cli process notes.jpg --quality premium --format pdf

Process a directory:

# Process all images in a directory
ghost-writer process notes_folder/ --output processed_notes/

# Local-only processing (no cloud APIs)
ghost-writer process notes/ --local-only --format markdown

Watch directory for new files:

# Automatically process new files
ghost-writer watch notes_folder/ --format all --interval 5

Sync from Supernote Cloud:

# Sync recent notes (requires configuration)
ghost-writer sync --since 2025-01-01 --output supernote_notes/

Check system status:

ghost-writer status

Basic API Usage

from src.cli import process_single_file
from src.utils.ocr_providers import HybridOCR
from src.utils.relationship_detector import RelationshipDetector
from src.utils.concept_clustering import ConceptExtractor, ConceptClusterer
from src.utils.structure_generator import StructureGenerator
from src.utils.database import DatabaseManager
from pathlib import Path

# Initialize components
ocr = HybridOCR()
detector = RelationshipDetector()
extractor = ConceptExtractor()
clusterer = ConceptClusterer()
generator = StructureGenerator()
db = DatabaseManager()

# Process a file
result = process_single_file(
    file_path=Path("my_notes.jpg"),
    ocr_provider=ocr,
    relationship_detector=detector,
    concept_extractor=extractor,
    concept_clusterer=clusterer,
    structure_generator=generator,
    db_manager=db,
    output_dir=Path("output/"),
    output_format="markdown",
    quality="balanced"
)

print(f"Generated: {result}")

πŸ§ͺ Testing & Quality

Comprehensive Test Suite

# Run full test suite (137 tests)
python -m pytest tests/ -v

# Run specific test categories
python -m pytest tests/test_e2e_integration.py -v    # End-to-end workflows
python -m pytest tests/test_ocr_providers.py -v     # OCR processing
python -m pytest tests/test_concept_clustering.py -v # Idea organization
python -m pytest tests/test_database.py -v          # Data persistence

# Performance testing
python -m pytest tests/test_e2e_integration.py::TestPerformanceAndScaling -v

Quality Metrics

  • Test Success Rate: 100% (137/137 tests passing)
  • Code Coverage: 68% (exceeds 65% requirement)
  • Performance: <30s OCR processing, <10s idea organization
  • Reliability: Comprehensive error handling and fallback mechanisms

πŸ“ Project Structure

ghost-writer/
β”œβ”€β”€ src/utils/
β”‚   β”œβ”€β”€ ocr_providers.py         # Hybrid OCR with smart routing
β”‚   β”œβ”€β”€ relationship_detector.py # Visual and semantic relationships
β”‚   β”œβ”€β”€ concept_clustering.py    # Multi-strategy concept extraction
β”‚   β”œβ”€β”€ structure_generator.py   # Document structure generation
β”‚   └── database.py             # SQLite persistence layer
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_e2e_integration.py  # Complete workflow testing
β”‚   β”œβ”€β”€ test_e2e_simple.py      # Simplified integration tests
β”‚   β”œβ”€β”€ test_ocr_providers.py   # OCR provider testing
β”‚   β”œβ”€β”€ test_ocr_mocks.py       # Mock-based OCR testing
β”‚   └── test_*.py               # Comprehensive test coverage
β”œβ”€β”€ config/config.yaml          # System configuration
β”œβ”€β”€ CLAUDE.md                   # Multi-agent system protocols
β”œβ”€β”€ AGENT_STATUS.md             # Real-time agent coordination
β”œβ”€β”€ HANDOFF_ARTIFACTS.md        # Inter-agent communication log
β”œβ”€β”€ QUALITY_DASHBOARD.md        # Test results and metrics
└── PRODUCT_SPECIFICATION.md    # Complete product specification

βš™οΈ Configuration

OCR Provider Setup

ocr:
  providers:
    tesseract:
      confidence_threshold: 60
      preprocessing:
        enhance_contrast: true
        remove_noise: true
    google_vision:
      confidence_threshold: 80
      cost_per_image: 0.0015
    gpt4_vision:
      confidence_threshold: 85
      cost_per_image: 0.01
    hybrid:
      cost_limit_per_day: 5.00
      quality_mode: "balanced"

Environment Variables

# Optional API keys for premium processing
GOOGLE_APPLICATION_CREDENTIALS=path/to/google-credentials.json
OPENAI_API_KEY=your-openai-api-key

# System configuration
GHOST_WRITER_LOG_LEVEL=INFO
GHOST_WRITER_DB_PATH=data/ghost_writer.db

πŸ“Š Performance Benchmarks

Component Performance Status
OCR Processing <30s per page Target
Relationship Detection <10s per page Target
Concept Clustering <5s per page Target
Structure Generation <5s per page Target
Database Operations <100ms Target
Test Suite Execution ~101s (137 tests) Achieved

πŸ€– Multi-Agent System

Agent Architecture

  • Supervisor Agent: Task coordination and quality oversight
  • QA Agent: Testing, validation, and integration verification
  • Implementation Agent: Feature development and optimization
  • Document-Based Communication: Agents coordinate through structured artifacts

Coordination Protocols

  • AGENT_STATUS.md: Real-time agent state tracking
  • HANDOFF_ARTIFACTS.md: Inter-agent communication log
  • QUALITY_DASHBOARD.md: Performance metrics and test results
  • Cost Monitoring: <$25/day budget with automatic controls

πŸ’° Pricing & Cost Control

Cost Structure

  • Local Processing: $0 (Tesseract + SQLite)
  • Google Vision: $0.0015/image (premium accuracy)
  • GPT-4 Vision: $0.01/image (semantic understanding)
  • Daily Budget: $5.00 default with automatic fallbacks

Value Proposition

  • Privacy-Conscious: 10x faster than manual transcription, zero privacy risk
  • Idea Organization: Transform scattered thoughts into publishable content
  • ROI: Predictable costs with automatic budget management

πŸ” Privacy & Security

Local-First Architecture

  • Tesseract OCR: Complete local processing for sensitive content
  • SQLite Database: Local storage with encrypted data
  • Audit Logging: Full processing history and decision tracking
  • Optional Cloud: Premium features only when explicitly enabled

Security Features

  • Zero data leakage in local mode
  • Comprehensive audit trails
  • API key management with environment isolation
  • Cost controls prevent unexpected charges

πŸ§‘β€πŸ’» Development & Contributing

Multi-Agent Development

See CLAUDE.md for complete multi-agent development protocols and architecture details.

Key Development Files

  • CLAUDE.md: Multi-agent system protocols and architecture
  • PRODUCT_SPECIFICATION.md: Complete product requirements and roadmap
  • DECISION_HISTORY.md: Architectural decisions and research findings
  • TESTING_STRATEGY.md: Comprehensive testing approach

Contributing

  1. Review multi-agent protocols in CLAUDE.md
  2. Run comprehensive test suite: python -m pytest tests/ -v
  3. Follow document-based development coordination
  4. Ensure 100% test success rate
  5. Update relevant .md documentation files

Supernote Integration

Supernote Cloud Sync

Ghost Writer includes Supernote integration with API authentication and file synchronization.

Quick Test:

# Test your Supernote Cloud connection
export SUPERNOTE_EMAIL="your.email@example.com"
export SUPERNOTE_PASSWORD="your-password"
python debug_supernote_test.py

Features:

  • βœ… Real API Integration: Authenticated connection to Supernote Cloud
  • βœ… Phone Number Login: Support for phone-based authentication
  • βœ… Secure Authentication: MD5+SHA256 hashing with random salt
  • βœ… File Synchronization: Download .note files directly from cloud
  • βœ… Binary .note Parsing: Extract vector graphics for OCR processing
  • βœ… HTTPS Security: All communication encrypted, no plaintext passwords

Quick Start:

  1. Test Connection: python debug_supernote_test.py
  2. Sync Files: ghost-writer sync --output ~/Downloads/
  3. Process Notes: ghost-writer process downloaded_file.note --format markdown

See QUICK_START.md for detailed setup instructions.

πŸ“± iOS CA Install (One-Tap)

Safari HTTPS-Only Mode requires trusted certificates. Install the Ghost Writer development CA:

One-tap install: ed-dev-root.mobileconfig

Post-install: Settings β†’ General β†’ About β†’ Certificate Trust Settings β†’ enable Full Trust for "Ed Dev Root CA"

πŸ“„ License

MIT License - see LICENSE file for details.


Ghost Writer v2.0 - Transform handwritten notes into structured intelligence with multi-agent AI coordination and live Supernote Cloud integration.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •