Skip to content
Michael W edited this page Nov 19, 2025 · 2 revisions

Welcome to the sakura-sumi wiki!

Quick Start

For Beginners

What is this? Sakura Sumi converts your code files into compressed PDFs that you can upload to AI models like Google Gemini. This lets you analyze entire codebases that would normally be too large.

3-Step Setup:

Install Python (if you don't have it):

Download from python.org Make sure to check "Add Python to PATH" during installation Get the code:

git clone https://github.com/yourusername/ocr-compression.git cd ocr-compression Set up and run:

Create a virtual environment (keeps dependencies organized) python3 -m venv venv

Activate it source venv/bin/activate # On macOS/Linux # OR: venv\Scripts\activate # On Windows

Install required packages pip install -r requirements.txt

Compress your codebase (replace with your actual path) python scripts/compress.py "/path/to/your/codebase" -v That's it! Your PDFs will be in {your_codebase}_ocr_ready/

Web Portal (Recommended for Beginners)

image

The web portal provides a user-friendly interface - perfect if you're not comfortable with command-line tools.

Start the web server python scripts/run_web.py

Open http://localhost:5001 in your browser Features:

  • Beautiful sakura (cherry blossom) themed interface
  • Point-and-click file selection
  • Real-time progress tracking
  • Token estimation before compression
  • Job history and results management
  • No command-line knowledge required
  • For Advanced Users: The web portal exposes all CLI features through a GUI, including parallel processing, resume capability, and OCR compression modes.

Clone this wiki locally