🖥️ CaptiOCR - Real-Time Screen Text Extraction

CaptiOCR is an open-source real-time screen text extraction tool designed to capture and transcribe captions (subtitles) from video conferencing applications like Microsoft Teams, Zoom, and Google Meet. With an intuitive interface and powerful OCR capabilities, you can select any screen area and extract text continuously in real-time.

🚀 Latest Version: 0.12.00 - Now with comprehensive multi-monitor support and DPI awareness!

✨ Key Features

✅ Real-time OCR processing using Tesseract OCR
✅ Multi-language support (English, Italian, French, German, Portuguese)
✅ Multi-monitor support with DPI awareness
✅ Dynamic area selection - drag, resize, and move capture areas during operation
✅ Text processing - automatic duplicate removal and text cleaning
✅ Profile management - save and load different configurations
✅ Hotkey support - Ctrl+Q to stop capture
✅ Export options - save captured text with custom naming
✅ Debug logging for troubleshooting
✅ Modular architecture - clean, maintainable codebase

🛠️ Prerequisites

Before installation, ensure you have:

✅ Python 3.8+ installed
✅ Tesseract OCR installed (Download here)
✅ Windows OS (primary support)

📦 Installation

1️⃣ Clone the Repository

git clone https://github.com/CarloSacchi/CaptiOCR.git
cd CaptiOCR

2️⃣ Install Python Dependencies

pip install -r requirements.txt

3️⃣ Install Tesseract OCR

Windows users:
Download and install Tesseract from the official releases.
The application will automatically detect standard installation paths.

🚀 Quick Start

Run the application:

python CaptiOCR.py

Basic Usage:

1️⃣ Select Language - Choose your OCR language from the dropdown
2️⃣ Click "Start (Select Area)" - Open the area selection tool
3️⃣ Drag to Select - Draw a rectangle around the text area you want to capture
4️⃣ Press ENTER - Begin real-time text extraction
5️⃣ Press Ctrl+Q or STOP - End the capture session
6️⃣ Name Your Capture - Save with a custom filename

📁 Output: Captured text is saved in the captures/ folder as timestamped .txt files.

🎯 Advanced Features

Multi-Monitor Support

Automatic detection of all connected monitors
DPI awareness for high-resolution displays
Cross-monitor selection - capture areas spanning multiple screens
Monitor-specific positioning for consistent setups

Dynamic Capture Areas

Resizable borders - adjust capture area during operation
Movable windows - reposition without stopping capture
Multiple profiles - save configurations for different applications

Text Processing

Duplicate detection - automatic removal of repeated text
Text cleaning - remove artifacts and formatting issues
Processed output - clean, readable transcriptions

Profile Management

Save Settings - store optimized configurations
Quick Load - switch between saved profiles
Application-specific - different settings for Teams, Zoom, Meet

💡 Tips & Best Practices

Optimizing OCR Accuracy

Language Selection: Choose the correct language model for best results with accents and special characters
Capture Area: Select narrow, wide rectangles focusing on subtitle regions
Minimum Size: Ensure capture areas are at least 50×50 pixels
Stable Areas: Target regions where text appears consistently

Performance Optimization

Close unnecessary applications to reduce system load
Use specific language models rather than auto-detection
Regular cleanup of old capture files and logs
Monitor system resources during extended capture sessions

📁 Project Structure

CaptiOCR/
├── CaptiOCR.py              # Main application entry point
├── captiocr/                # Core application modules
│   ├── config/              # Settings and constants
│   ├── core/                # OCR and capture logic
│   ├── models/              # Data models
│   ├── ui/                  # User interface components
│   └── utils/               # Utilities and helpers
├── captures/                # Saved text outputs
├── config/                  # User preferences
├── tessdata/                # OCR language files
├── logs/                    # Application logs
└── resources/               # Icons and assets

🔧 Configuration

The application uses JSON configuration files stored in config/:

User preferences - UI settings, language choices
Language data - Available OCR models
Capture profiles - Saved area configurations

📋 System Requirements

OS: Windows 10/11 (primary), Linux/macOS (experimental)
RAM: 4GB minimum, 8GB recommended
CPU: Multi-core processor recommended for real-time processing
Display: Support for multiple monitors with varying DPI
Storage: 100MB+ for application and language files

🐛 Troubleshooting

Common Issues:

OCR not working: Verify Tesseract installation and PATH
Text not detected: Check language selection and capture area size
Performance issues: Close other applications, check system resources
Multi-monitor problems: Update display drivers, check DPI settings

Debug Logging:

Enable debug logging in the application settings to capture detailed operation information for troubleshooting.

🗺️ Roadmap

Current Version (0.12.00)

✅ Multi-monitor support with DPI awareness
✅ Modular, maintainable codebase
✅ Enhanced text processing
✅ Improved error handling

Upcoming Features

🔄 Live translation integration
🔄 Cloud storage synchronization
🔄 Export formats (PDF, HTML, Word)
🔄 API integration for external applications
🔄 Dark mode and theme customization
🔄 Batch processing capabilities

🤝 Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Follow the coding guidelines in CLAUDE.md
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 Python style guide
Use type hints and docstrings
Maintain modular architecture
Add comprehensive logging
Update version numbers for functional changes

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👤 Author & Support

Author: Carlo Sacchi
Website: https://www.captiocr.com
Version: 0.12.00 (August 2025)

For support, feature requests, or bug reports, please open an issue on GitHub.

⭐ If CaptiOCR helps you, please consider giving it a star on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
captiocr		captiocr
images		images
scripts		scripts
.gitignore		.gitignore
.sync_timestamp		.sync_timestamp
CaptiOCR.py		CaptiOCR.py
CaptiOCR.spec		CaptiOCR.spec
LICENSE		LICENSE
README.md		README.md
app.manifest		app.manifest
captiocr_logo.png		captiocr_logo.png
captiocr_logo_trasp.png		captiocr_logo_trasp.png
icon.ico		icon.ico
icon16.ico		icon16.ico
requirements.txt		requirements.txt
version.txt		version.txt

License

carlosacchi/captiocr

Folders and files

Latest commit

History

Repository files navigation

🖥️ CaptiOCR - Real-Time Screen Text Extraction

✨ Key Features

🛠️ Prerequisites

📦 Installation

1️⃣ Clone the Repository

2️⃣ Install Python Dependencies

3️⃣ Install Tesseract OCR

🚀 Quick Start

Basic Usage:

🎯 Advanced Features

Multi-Monitor Support

Dynamic Capture Areas

Text Processing

Profile Management

💡 Tips & Best Practices

Optimizing OCR Accuracy

Performance Optimization

📁 Project Structure

🔧 Configuration

📋 System Requirements

🐛 Troubleshooting

Common Issues:

Debug Logging:

🗺️ Roadmap

Current Version (0.12.00)

Upcoming Features

🤝 Contributing

Development Guidelines

📄 License

👤 Author & Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Uh oh!

Contributors 2

Uh oh!

Languages