CaptiOCR is an open-source real-time screen text extraction tool designed to capture and transcribe captions (subtitles) from video conferencing applications like Microsoft Teams, Zoom, and Google Meet. With an intuitive interface and powerful OCR capabilities, you can select any screen area and extract text continuously in real-time.
π Latest Version: 0.12.00 - Now with comprehensive multi-monitor support and DPI awareness!
β
Real-time OCR processing using Tesseract OCR
β
Multi-language support (English, Italian, French, German, Portuguese)
β
Multi-monitor support with DPI awareness
β
Dynamic area selection - drag, resize, and move capture areas during operation
β
Text processing - automatic duplicate removal and text cleaning
β
Profile management - save and load different configurations
β
Hotkey support - Ctrl+Q to stop capture
β
Export options - save captured text with custom naming
β
Debug logging for troubleshooting
β
Modular architecture - clean, maintainable codebase
Before installation, ensure you have:
- β Python 3.8+ installed
- β Tesseract OCR installed (Download here)
- β Windows OS (primary support)
git clone https://github.com/CarloSacchi/CaptiOCR.git
cd CaptiOCRpip install -r requirements.txtWindows users:
Download and install Tesseract from the official releases.
The application will automatically detect standard installation paths.
Run the application:
python CaptiOCR.py1οΈβ£ Select Language - Choose your OCR language from the dropdown
2οΈβ£ Click "Start (Select Area)" - Open the area selection tool
3οΈβ£ Drag to Select - Draw a rectangle around the text area you want to capture
4οΈβ£ Press ENTER - Begin real-time text extraction
5οΈβ£ Press Ctrl+Q or STOP - End the capture session
6οΈβ£ Name Your Capture - Save with a custom filename
π Output: Captured text is saved in the captures/ folder as timestamped .txt files.
- Automatic detection of all connected monitors
- DPI awareness for high-resolution displays
- Cross-monitor selection - capture areas spanning multiple screens
- Monitor-specific positioning for consistent setups
- Resizable borders - adjust capture area during operation
- Movable windows - reposition without stopping capture
- Multiple profiles - save configurations for different applications
- Duplicate detection - automatic removal of repeated text
- Text cleaning - remove artifacts and formatting issues
- Processed output - clean, readable transcriptions
- Save Settings - store optimized configurations
- Quick Load - switch between saved profiles
- Application-specific - different settings for Teams, Zoom, Meet
- Language Selection: Choose the correct language model for best results with accents and special characters
- Capture Area: Select narrow, wide rectangles focusing on subtitle regions
- Minimum Size: Ensure capture areas are at least 50Γ50 pixels
- Stable Areas: Target regions where text appears consistently
- Close unnecessary applications to reduce system load
- Use specific language models rather than auto-detection
- Regular cleanup of old capture files and logs
- Monitor system resources during extended capture sessions
CaptiOCR/
βββ CaptiOCR.py # Main application entry point
βββ captiocr/ # Core application modules
β βββ config/ # Settings and constants
β βββ core/ # OCR and capture logic
β βββ models/ # Data models
β βββ ui/ # User interface components
β βββ utils/ # Utilities and helpers
βββ captures/ # Saved text outputs
βββ config/ # User preferences
βββ tessdata/ # OCR language files
βββ logs/ # Application logs
βββ resources/ # Icons and assets
The application uses JSON configuration files stored in config/:
- User preferences - UI settings, language choices
- Language data - Available OCR models
- Capture profiles - Saved area configurations
- OS: Windows 10/11 (primary), Linux/macOS (experimental)
- RAM: 4GB minimum, 8GB recommended
- CPU: Multi-core processor recommended for real-time processing
- Display: Support for multiple monitors with varying DPI
- Storage: 100MB+ for application and language files
- OCR not working: Verify Tesseract installation and PATH
- Text not detected: Check language selection and capture area size
- Performance issues: Close other applications, check system resources
- Multi-monitor problems: Update display drivers, check DPI settings
Enable debug logging in the application settings to capture detailed operation information for troubleshooting.
- β Multi-monitor support with DPI awareness
- β Modular, maintainable codebase
- β Enhanced text processing
- β Improved error handling
- π Live translation integration
- π Cloud storage synchronization
- π Export formats (PDF, HTML, Word)
- π API integration for external applications
- π Dark mode and theme customization
- π Batch processing capabilities
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Follow the coding guidelines in
CLAUDE.md - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 Python style guide
- Use type hints and docstrings
- Maintain modular architecture
- Add comprehensive logging
- Update version numbers for functional changes
This project is licensed under the MIT License - see the LICENSE file for details.
Author: Carlo Sacchi
Website: https://www.captiocr.com
Version: 0.12.00 (August 2025)
For support, feature requests, or bug reports, please open an issue on GitHub.
β If CaptiOCR helps you, please consider giving it a star on GitHub!