Skip to content

CaptiOCR - A real-time screen text extraction tool using Tesseract OCR. Capture, recognize, and log on-screen text dynamically. Future updates will include on-demand language installation, resizable selection areas, and live text overlays.

License

Notifications You must be signed in to change notification settings

carlosacchi/captiocr

Repository files navigation

πŸ–₯️ CaptiOCR - Real-Time Screen Text Extraction

CodeQL

CaptiOCR is an open-source real-time screen text extraction tool designed to capture and transcribe captions (subtitles) from video conferencing applications like Microsoft Teams, Zoom, and Google Meet. With an intuitive interface and powerful OCR capabilities, you can select any screen area and extract text continuously in real-time.

πŸš€ Latest Version: 0.12.00 - Now with comprehensive multi-monitor support and DPI awareness!


✨ Key Features

βœ… Real-time OCR processing using Tesseract OCR
βœ… Multi-language support (English, Italian, French, German, Portuguese)
βœ… Multi-monitor support with DPI awareness
βœ… Dynamic area selection - drag, resize, and move capture areas during operation
βœ… Text processing - automatic duplicate removal and text cleaning
βœ… Profile management - save and load different configurations
βœ… Hotkey support - Ctrl+Q to stop capture
βœ… Export options - save captured text with custom naming
βœ… Debug logging for troubleshooting
βœ… Modular architecture - clean, maintainable codebase


πŸ› οΈ Prerequisites

Before installation, ensure you have:

  • βœ… Python 3.8+ installed
  • βœ… Tesseract OCR installed (Download here)
  • βœ… Windows OS (primary support)

πŸ“¦ Installation

1️⃣ Clone the Repository

git clone https://github.com/CarloSacchi/CaptiOCR.git
cd CaptiOCR

2️⃣ Install Python Dependencies

pip install -r requirements.txt

3️⃣ Install Tesseract OCR

Windows users:
Download and install Tesseract from the official releases.
The application will automatically detect standard installation paths.


πŸš€ Quick Start

Run the application:

python CaptiOCR.py

Basic Usage:

1️⃣ Select Language - Choose your OCR language from the dropdown
2️⃣ Click "Start (Select Area)" - Open the area selection tool
3️⃣ Drag to Select - Draw a rectangle around the text area you want to capture
4️⃣ Press ENTER - Begin real-time text extraction
5️⃣ Press Ctrl+Q or STOP - End the capture session
6️⃣ Name Your Capture - Save with a custom filename

πŸ“ Output: Captured text is saved in the captures/ folder as timestamped .txt files.


🎯 Advanced Features

Multi-Monitor Support

  • Automatic detection of all connected monitors
  • DPI awareness for high-resolution displays
  • Cross-monitor selection - capture areas spanning multiple screens
  • Monitor-specific positioning for consistent setups

Dynamic Capture Areas

  • Resizable borders - adjust capture area during operation
  • Movable windows - reposition without stopping capture
  • Multiple profiles - save configurations for different applications

Text Processing

  • Duplicate detection - automatic removal of repeated text
  • Text cleaning - remove artifacts and formatting issues
  • Processed output - clean, readable transcriptions

Profile Management

  • Save Settings - store optimized configurations
  • Quick Load - switch between saved profiles
  • Application-specific - different settings for Teams, Zoom, Meet

πŸ’‘ Tips & Best Practices

Optimizing OCR Accuracy

  • Language Selection: Choose the correct language model for best results with accents and special characters
  • Capture Area: Select narrow, wide rectangles focusing on subtitle regions
  • Minimum Size: Ensure capture areas are at least 50Γ—50 pixels
  • Stable Areas: Target regions where text appears consistently

Performance Optimization

  • Close unnecessary applications to reduce system load
  • Use specific language models rather than auto-detection
  • Regular cleanup of old capture files and logs
  • Monitor system resources during extended capture sessions

πŸ“ Project Structure

CaptiOCR/
β”œβ”€β”€ CaptiOCR.py              # Main application entry point
β”œβ”€β”€ captiocr/                # Core application modules
β”‚   β”œβ”€β”€ config/              # Settings and constants
β”‚   β”œβ”€β”€ core/                # OCR and capture logic
β”‚   β”œβ”€β”€ models/              # Data models
β”‚   β”œβ”€β”€ ui/                  # User interface components
β”‚   └── utils/               # Utilities and helpers
β”œβ”€β”€ captures/                # Saved text outputs
β”œβ”€β”€ config/                  # User preferences
β”œβ”€β”€ tessdata/                # OCR language files
β”œβ”€β”€ logs/                    # Application logs
└── resources/               # Icons and assets

πŸ”§ Configuration

The application uses JSON configuration files stored in config/:

  • User preferences - UI settings, language choices
  • Language data - Available OCR models
  • Capture profiles - Saved area configurations

πŸ“‹ System Requirements

  • OS: Windows 10/11 (primary), Linux/macOS (experimental)
  • RAM: 4GB minimum, 8GB recommended
  • CPU: Multi-core processor recommended for real-time processing
  • Display: Support for multiple monitors with varying DPI
  • Storage: 100MB+ for application and language files

πŸ› Troubleshooting

Common Issues:

  • OCR not working: Verify Tesseract installation and PATH
  • Text not detected: Check language selection and capture area size
  • Performance issues: Close other applications, check system resources
  • Multi-monitor problems: Update display drivers, check DPI settings

Debug Logging:

Enable debug logging in the application settings to capture detailed operation information for troubleshooting.


πŸ—ΊοΈ Roadmap

Current Version (0.12.00)

  • βœ… Multi-monitor support with DPI awareness
  • βœ… Modular, maintainable codebase
  • βœ… Enhanced text processing
  • βœ… Improved error handling

Upcoming Features

  • πŸ”„ Live translation integration
  • πŸ”„ Cloud storage synchronization
  • πŸ”„ Export formats (PDF, HTML, Word)
  • πŸ”„ API integration for external applications
  • πŸ”„ Dark mode and theme customization
  • πŸ”„ Batch processing capabilities

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Follow the coding guidelines in CLAUDE.md
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Development Guidelines

  • Follow PEP 8 Python style guide
  • Use type hints and docstrings
  • Maintain modular architecture
  • Add comprehensive logging
  • Update version numbers for functional changes

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘€ Author & Support

Author: Carlo Sacchi
Website: https://www.captiocr.com
Version: 0.12.00 (August 2025)

For support, feature requests, or bug reports, please open an issue on GitHub.


⭐ If CaptiOCR helps you, please consider giving it a star on GitHub!

About

CaptiOCR - A real-time screen text extraction tool using Tesseract OCR. Capture, recognize, and log on-screen text dynamically. Future updates will include on-demand language installation, resizable selection areas, and live text overlays.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •