Flexible and powerful image analysis
A powerful command-line interface for AI-powered image analysis and understanding. Built with PyTorch and Transformers, this tool provides easy-to-use commands for analyzing images with state-of-the-art vision-language models.
- Single Image Analysis - Analyze individual images with AI descriptions
- Batch Processing - Process entire directories of images at once
- Web Interface - Launch a Streamlit web app for interactive use
- Multiple Model Support - Use different pre-trained models
- Export Results - Save analysis results to JSON files
- Cross-Platform - Works on Windows, macOS, and Linux
Linux/macOS:
chmod +x install.sh
./install.shWindows:
install.bat# Clone the repository
git clone https://github.com/yourusername/image-understanding-cli.git
cd image-understanding-cli
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install CLI tool
pip install -e .make install# Show help
image-cli --help
# Analyze a single image
image-cli single path/to/image.jpg
# Analyze with custom text prompt
image-cli single image.jpg --text "What's in this image?"
# Process all images in a directory
image-cli batch ./my_images/
# Save results to file
image-cli batch ./my_images/ --output results.json
# Launch web interface
image-cli streamlit
# Show system information
image-cli infoAnalyze a single image and get an AI-generated description.
image-cli single <image_path> [options]Options:
--text, -t- Add text prompt for guided analysis--model, -m- Specify model to use (default: microsoft/git-base)
Examples:
# Basic analysis
image-cli single photo.jpg
# With text prompt
image-cli single photo.jpg --text "Describe the colors in this image"
# Using different model
image-cli single photo.jpg --model "Salesforce/blip-image-captioning-base"Process all images in a directory and optionally save results.
image-cli batch <directory> [options]Options:
--output, -o- Output file for results (JSON format)--model, -m- Specify model to use
Examples:
# Process directory
image-cli batch ./photos/
# Save results to file
image-cli batch ./photos/ --output analysis_results.jsonLaunch an interactive web interface for image analysis.
image-cli streamlit [options]Options:
--port, -p- Port number (default: 8501)
Example:
# Launch on default port
image-cli streamlit
# Launch on custom port
image-cli streamlit --port 8080Display system information and check dependencies.
image-cli info- JPEG (.jpg, .jpeg)
- PNG (.png)
- BMP (.bmp)
- TIFF (.tiff)
- WebP (.webp)
The CLI supports various pre-trained models:
microsoft/git-base(default) - General image captioningSalesforce/blip-image-captioning-base- BLIP model for captioningmicrosoft/git-large- Larger version with better accuracySalesforce/blip-image-captioning-large- Large BLIP model
You can set these environment variables to customize behavior:
export IMAGE_CLI_MODEL="microsoft/git-base" # Default model
export IMAGE_CLI_DEVICE="cuda" # Force device (cuda/cpu)
export IMAGE_CLI_LOG_LEVEL="INFO" # Logging levelCreate a config.yaml file in your project directory:
model:
name: "microsoft/git-base"
device: "auto" # auto, cuda, cpu
output:
format: "json" # json, csv, txt
timestamp: true
logging:
level: "INFO"
file: "image_cli.log"# Process all family photos and save results
image-cli batch ~/Pictures/Family/ --output family_analysis.json
# View specific photo
image-cli single ~/Pictures/vacation.jpg --text "What activities are shown?"# Analyze images for content
image-cli batch ./uploads/ --text "Describe any inappropriate content" --output moderation.json# Analyze product images
image-cli batch ./products/ --text "Describe this product" --output product_descriptions.json# Clone repository
git clone https://github.com/yourusername/image-understanding-cli.git
cd image-understanding-cli
# Install in development mode
make dev
# Run tests
make test
# Format code
make format
# Run linter
make lintimage-understanding-cli/
βββ cli.py # Main CLI script
βββ app.py # Streamlit web app
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup
βββ Makefile # Build commands
βββ install.sh # Linux/macOS installer
βββ install.bat # Windows installer
βββ README.md # This file
βββ tests/ # Test files
βββ test_cli.py
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Add tests for new functionality
- Run tests:
make test - Submit a pull request
"No module named 'transformers'"
pip install transformersCUDA out of memory
# Force CPU usage
export IMAGE_CLI_DEVICE="cpu"
image-cli single image.jpgPermission denied on install.sh
chmod +x install.sh
./install.shStreamlit not found
pip install streamlit- Use GPU if available for faster processing
- Process images in smaller batches for large datasets
- Use smaller models for faster inference
- Resize large images before processing
Contributions are welcome! Please read our contributing guidelines and submit pull requests for any improvements.
- Additional model support
- Performance optimizations
- New output formats
- Better error handling
- Documentation improvements
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Transformers by Hugging Face
- Uses PyTorch for deep learning
- Web interface powered by Streamlit
- Inspired by the amazing work of the computer vision community
- Create an issue on GitHub for bug reports
- Star the repository if you find it useful
- Share with others who might benefit
Happy analyzing! π