A monorepo for processing, organizing, and managing Sabbath School lessons with OCR cleanup, translation workflows, and progress tracking.
├── apps/ # Future applications
├── packages/ # Shared packages
├── scripts/ # Python scripts for processing
│ ├── download-lessons.py # Download lessons from SSL PDFs
│ ├── update-lessons-json.py # Update lesson completion status
│ └── list-undone-lessons.py # Track processing progress
├── data/ # Data files and processed lessons
│ ├── lessons.json # Lesson metadata and completion status
│ ├── downloads/ # Downloaded raw text files
│ └── lessons/ # Processed markdown files
│ └── [DECADE]/[YEAR]/[QUARTER]/[LANGUAGE]/
├── .claude/commands/ # Claude Code commands
│ └── cleanup-lessons.md # OCR cleanup and formatting command
├── Makefile # Build and utility commands
├── package.json # Monorepo configuration
└── turbo.json # Turbo build configuration
make install-deps# Download all lessons as .txt files
make download-lessons
# Test with first 10 files
make download-test
# Preview without downloading
make download-dry-run# Install PDF processing dependencies
make install-pdf-deps
# Download all lessons as .pdf files
make download-pdfs
# Test with first 10 PDF files
make download-pdfs-test
# Convert PDFs to page-by-page text files
make convert-pdfs
# Convert specific PDF
make convert-pdf PDF=path/to/lesson.pdf# Show completion progress for all languages
make list-progress
# List undone lessons
make list-undone
# List undone English lessons with details
make list-undone-en
# List undone Kiswahili lessons with details
make list-undone-sw# Process a single lesson (English)
/cleanup-lessons 1888 Q1
# Process with translation to Kiswahili
/cleanup-lessons 1888 Q1 sw
# Process a range of lessons
/cleanup-lessons 1888 Q1 --end-year 1890 --end-quarter Q2
# Redo existing processing
/cleanup-lessons 1888 Q1 --redo# Process all undone English lessons from first undone (AI does OCR cleanup)
/cleanup en
# Process all undone Kiswahili lessons from first undone (AI translates)
/cleanup sw
# Process up to a specific year with parallel processing
/cleanup en --end-year 1895
# Find first undone lesson with source files available
make first-not-done-en# Extract lesson from PDF page text files (better OCR control)
/extract-from-pages 1913 Q2
# Extract with translation to Kiswahili
/extract-from-pages 1913 Q2 sw
# Redo extraction with improvements
/extract-from-pages 1913 Q2 --redoAI-Powered Processing Features:
- AI performs OCR cleanup and markdown formatting directly
- Up to 4 parallel agents for full year processing (Q1-Q4)
- Intelligent source file validation and dependency checking
- Page-by-page processing for superior OCR error handling
- Automatic progress tracking and status updates
- Automatic OCR error correction
- Convert roman numerals to arabic (LESSON IV → Lesson 4)
- Fix ALL CAPS formatting to proper case
- Ensure double spacing between questions/notes for PDF rendering
- Create properly structured markdown files
- Process English lessons first
- Translate to Kiswahili with preserved formatting
- Maintain theological accuracy and cultural appropriateness
- Auto-update completion status
- Track completion by language
- Generate progress reports
- List undone lessons by year range
- Monitor processing status
- Automatic decade/year/quarter structure
- Language-specific folders (en, sw, etc.)
- Consistent file naming conventions
Individual lesson processing with OCR cleanup and translation.
Syntax:
/cleanup-lessons YEAR QUARTER [LANGUAGE] [OPTIONS]
Examples:
/cleanup-lessons 1888 Q1 # Process 1888 Q1 in English
/cleanup-lessons 1888 Q1 sw # Process 1888 Q1 in Kiswahili
/cleanup-lessons 1888 Q1 --end-year 1890 # Process range
/cleanup-lessons 1888 Q1 --redo # Redo processingOptions:
--redo- Reprocess even if already done--end-year YEAR- Process range to end year--end-quarter QUARTER- Process range to end quarter
AI-driven batch processing starting from first undone lesson with source files.
Syntax:
/cleanup LANGUAGE [OPTIONS]
Examples:
/cleanup en # Process all undone English lessons with AI
/cleanup sw # Process Kiswahili translations with AI
/cleanup en --end-year 1895 # Process English up to 1895Options:
--end-year YEAR- Stop processing at specified year--end-quarter QUARTER- Stop processing at specified quarter
Extract and organize lesson content from PDF page text files for superior OCR control.
Syntax:
/extract-from-pages YEAR QUARTER [LANGUAGE] [OPTIONS]
Examples:
/extract-from-pages 1913 Q2 # Extract from PDF page files
/extract-from-pages 1913 Q2 sw # Extract and translate to Kiswahili
/extract-from-pages 1913 Q2 --redo # Redo extractionOptions:
--redo- Redo extraction even if already done
Page Processing Features:
- Page-by-Page Control: Process individual pages for better error handling
- Smart Content Reconstruction: Intelligently merge content across pages
- Superior OCR Correction: Fix errors with page-level context
- Content Mapping: Track which content comes from which pages
- Quality Assessment: Monitor OCR quality page by page
AI Features (Both Commands):
- Direct OCR Cleanup: AI performs all text correction and formatting
- Parallel Processing: Up to 4 agents processing quarters simultaneously
- Smart Dependencies: Automatic English version checking for translations
- Source Validation: Only processes lessons with available source files
- Progress Tracking: Automatic status updates and completion tracking
python scripts/download-lessons.py [--test] [--dry-run]# Add language completion
python scripts/update-lessons-json.py add 1888 Q1 en
# Remove language completion
python scripts/update-lessons-json.py remove 1888 Q1 sw
# Check lesson status
python scripts/update-lessons-json.py status 1888 Q1
# List all languages
python scripts/update-lessons-json.py languages# List all undone lessons
python scripts/list-undone-lessons.py
# List undone for specific language
python scripts/list-undone-lessons.py --language sw
# List completed lessons
python scripts/list-undone-lessons.py --completed
# Show progress for all languages
python scripts/list-undone-lessons.py --progress
# Filter by year range
python scripts/list-undone-lessons.py --start-year 1888 --end-year 1895- All OCR errors corrected
- Roman numerals converted to arabic numbers
- ALL CAPS converted to proper case
- Double spacing between question/note numbers
- Proper markdown syntax
- Original theological content preserved
- Proper decade/year/quarter/language directory structure
- Consistent file naming (front-matter.md, week-01.md, etc.)
- Valid JSON metadata in contents.json
- PDF rendering compatibility
- English version completed first
- Theological accuracy maintained
- Cultural appropriateness for target language
- Consistent terminology usage
| Target | Description |
|---|---|
help |
Show available targets |
download-lessons |
Download all lessons |
download-test |
Test with first 10 files |
list-progress |
Show completion progress |
list-undone |
List undone lessons |
list-undone-en |
List undone English lessons |
list-undone-sw |
List undone Kiswahili lessons |
first-not-done-en |
Show first undone English lesson |
first-not-done-sw |
Show first undone Kiswahili lesson |
show-languages |
List all available languages |
clean |
Remove downloaded files |
- Download raw lessons:
make download-lessons - Check progress:
make list-progress - Process English:
/cleanup-lessons YEAR QUARTER - Process translation:
/cleanup-lessons YEAR QUARTER sw - Verify completion:
make list-completed-en
- Process with validation:
/cleanup-lessons YEAR QUARTER - Review generated files: Check markdown structure
- Re-process if needed:
/cleanup-lessons YEAR QUARTER --redo - Update metadata: Automatic via Claude command
- Follow the established directory structure
- Ensure double spacing for PDF compatibility
- Preserve original theological content
- Test with small batches before full processing
- Update completion status after processing
This project is for processing historical Sabbath School lessons for preservation and accessibility.