🖼️ Ollama Image Captionizer

A Python script that uses a local Ollama multimodal model to generate captions for your images in bulk. You can use the prompt to guide the vision model to include certain keywords, to describe a certain person by their name. It features a rich, interactive terminal user interface (TUI) for easy operation, configuration, and live progress tracking. This is mostly a helper tool for preparing image datasets for training with FLUX. They are captions, as unlike Stable Diffusion, FLUX relies on natural language processing over keyword processing.

✨ Features

Interactive TUI: A user-friendly, menu-driven interface built with rich and gum. No need to edit the script to change settings!
Flexible Image Selection: Process an entire directory of images or use the file picker to select specific images.
Live Progress Logging: A beautiful, real-time table shows you which files are being processed, their status, and a preview of the generated caption.
Smart Feedback: Uses emojis and colors to clearly indicate successes, skips, failures, and warnings for low-quality (e.g., single-word) captions.
Persistent Configuration: Your last-used settings (model, prompt, image source) are automatically saved to a config.json file for your next session.
Cross-Platform: Built with Python, it's designed to be compatible with macOS, Linux, and Windows.

⚙️ Requirements

Before you begin, ensure you have the following installed and running:

Python 3.x
Ollama: The script requires a running Ollama instance.
A Multimodal Ollama Model: You need a model capable of processing images, such as moondream.
```
ollama pull moondream
```
Rich: A Python library for rich text and beautiful formatting in the terminal.
```
pip install rich
```
Gum: A tool for glamorous shell scripts, used for the interactive menus.
- macOS: brew install gum
- Other Systems: See the official Gum installation guide.

🚀 Quick Start

Install Dependencies: Make sure you have installed Python, Rich, and Gum as listed in the requirements section.
Start Ollama: Ensure the Ollama application is running and the server is active.
Run the Script: Save the code as ollama_captionizer.py and run it from your terminal:
```
python3 ollama_captionizer.py
```
Use the Menu: You will be greeted by the main menu, where you can:
- Set Image Source: Choose a directory or select specific image files.
- Edit Prompt: Customize the prompt sent to the model.
- Start Captioning: Begin the process.

Captions will be saved as .txt files with the same name as the original image (e.g., my_photo.jpg -> my_photo.txt).

🖥️ Cross-Platform Compatibility

This script is written in Python and is designed to be cross-platform. It should work on macOS, Linux, and Windows provided the dependencies are met.

A key feature is that it communicates with the Ollama server over its network API (e.g., http://localhost:11434). This means you do not need to modify the script to handle different executable names like ollama.exe on Windows.

The primary consideration for cross-platform use is ensuring that the gum command-line tool is properly installed and accessible in your system's PATH.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
ollama_captionizer.py		ollama_captionizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖼️ Ollama Image Captionizer

✨ Features

⚙️ Requirements

🚀 Quick Start

🖥️ Cross-Platform Compatibility

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🖼️ Ollama Image Captionizer

✨ Features

⚙️ Requirements

🚀 Quick Start

🖥️ Cross-Platform Compatibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages