A distributed system designed to generate and compare image captions using state-of-the-art Hugging Face models. It supports multiple models simultaneously, worker scaling, and provides a web interface for managing workers and visualizing results.
- Image captioning using Hugging Face models
- Support for image-to-text and image-text-to-text models
- Fast and asynchronous processing using aio_pika and asyncio
- Dynamic model downloading, unloading, and deletion per worker
- Support for custom user-defined models
- Task handling with routing keys per model
- RESTful API to manage workers, models, and image tasks
- Handles file uploads, model lifecycle, and response collection
- Keeps track of available workers and cached/loaded models
- View and manage available workers
- Upload one or more images
- Select target models
- View generated captions in a clean UI
Results of Model Testing
Smallest tested model (34.2M params). Extremely fast, though the captions often don't make sense.
Good results in most cases, but occasionally stops mid-sentence.
Efficient and compact. Produces short but relatively expressive descriptions.
Generates short and very generic captions that often lack detail.
Captures more detail than the smaller version, but still produces short captions.
Generates fairly detailed captions, but often hallucinates elements loosely related to the image.
Generates long, rich descriptions with impressive vocabulary and good speed, but often includes fabricated details and occasionally leaves captions unfinished.
Client: Typescript, React
Server: Python, FastAPI, aio_pika
Worker: Python, aio_pika, transformers, prometheus_client
- Docker and Docker Compose installed
- A running RabbitMQ instance - either:
- Locally, using Docker or a native installation, or
- In the cloud, accessible to both the backend and all workers
You need to run RabbitMQ, the backend + frontend, and one or more workers.
docker-compose up
This will start:
- FastAPI backend at http://localhost:8000
- Frontend (React) at http://localhost:3333
In a separate terminal:
docker-compose -f docker-compose.worker.yml up
This command starts both the worker service and Prometheus monitoring service as defined in the docker-compose.worker.yml file.
You can run multiple worker instances if needed - even on different machines by connecting them to a shared RabbitMQ instance in the cloud.
Make sure that the backend service and all workers use the same RabbitMQ URL. Example .env file:
RABBITMQ_URL=amqp://guest:guest@rabbitmq:5672/
You can also customize the backend's response queue name (useful if multiple backends share the same broker):
SERVER_QUEUE=example_queue
To add your own custom image captioning model to the system, you need to implement the following interface:
from abc import ABC, abstractmethod
from PIL import Image
class CustomModel(ABC):
@abstractmethod
def load(self) -> None:
"""
Load the custom model.
This method should handle any model initialization and loading from disk or remote sources.
"""
pass
@abstractmethod
def infer(self, image: Image) -> str:
"""
Run inference on the provided image and return a caption as a string.
Args:
image (PIL.Image): The input image to caption.
Returns:
str: The generated caption for the image.
"""
pass- Implement your custom model class by inheriting from
CustomModel. - Place your implementation in the
custom_inferdirectory. - Alternatively, you can add and manage your custom models dynamically using the frontend management interface, which allows uploading and configuring models without restarting workers.
- Once loaded, the system will call
load()to initialize your model andinfer()to generate captions for input images.
Each worker exposes Prometheus-compatible metrics on port 8001 at the /metrics endpoint. The following key metrics are available:
inference_duration_seconds- Inference duration per modelprocessed_messages_total- Total number of processed messages per modelprocessing_errors_total- Total number of processing errors per modelworker_cpu_usage_percent- CPU usage percent of the worker processworker_ram_usage_percent- RAM usage percent of the worker process
These metrics can be collected by Prometheus and visualized with Grafana.
