A high-performance text extraction tool that leverages GPU acceleration and the Gemma 3:27B model through Ollama for efficient image-to-text conversion.
- 🖼️ Image Processing: Supports PNG, JPG, JPEG, and WebP formats
- 🤖 AI-Powered: Utilizes Gemma 3:27B model for accurate text extraction
- 📊 Performance Metrics: Detailed GPU and processing statistics
- 🛠️ System Monitoring: GPU utilization, memory usage, and temperature tracking
- 🔍 Health Checks: Automatic service monitoring and error handling
- NVIDIA GPU (for optimal performance) or compatible system
- Ollama framework installed
- Gemma 3:27B model available in Ollama
- Python 3.x
- Required Python packages:
- ollama
- Pillow (PIL)
-
Install Ollama and the Gemma 3:27B model:
ollama pull gemma3:27b
-
Install Python dependencies:
pip install ollama pillow
-
Create a
samplesdirectory and add your images
- Place images in the
samplesdirectory - Run the script:
python test.py
The script will:
- Process all images in the samples directory
- Sort images by size (smallest to largest)
- Extract text content
- Display detailed processing metrics
- Show GPU utilization statistics
The script provides comprehensive output including:
- Image dimensions and file size
- Processing duration and token counts
- GPU utilization metrics
- Extracted text content
- Performance statistics
- Images are processed in order of size (smallest to largest)
- The model is configured to extract only text content, ignoring visual elements
- Processing time varies based on image size and complexity
- GPU monitoring is available for NVIDIA GPUs and macOS systems