AI Illustrator is a powerful tool designed to automatically generate consistent, high-quality illustrations for stories using Google's Gemini models for both text analysis and image generation. It processes a story text file, analyzes it to understand the visual style, characters, and locations, and then generates a sequence of cinematic illustrations.
- Automatic Style Detection: Analyzes the story text to determine the most appropriate art style and generates consistent illustrations based on that style.
- Character Consistency:
- Extracts character descriptions and generates reference character images (Full Body, 16:9).
- Maintains a persistent catalog of characters in
output/data.jsonto ensure the same character looks consistent throughout the story. - Uses reference images (multimodal generation) to keep character appearance stable across different scenes.
- Location Consistency:
- Generates and caches location reference images (16:9 cinematic shots).
- Maintains a location catalog in
output/data.jsonto reuse settings.
- Cinematic Scene Generation:
- Splits the story into logical scenes.
- Generates a single, cohesive cinematic frame for each scene (16:9 aspect ratio).
- Enforces strict negative constraints to prevent comic-book layouts, text, or split screens.
- Uses full-body character references to maintain consistency across scenes.
- Docker Support: Fully containerized for easy deployment and execution.
- Comprehensive Testing: Includes a full suite of unit and integration-like tests using
pytest.
- Python 3.10+ (if running locally)
- Docker & Docker Compose (recommended for isolation)
- Google Cloud API Key with access to Gemini models (including image generation capabilities).
git clone <repository-url>
cd ai-illustratorCopy the example environment file and add your API key.
cp .env.example .envOpen .env and set your variables:
GEMINI_API_KEY=your_api_key_here
TEXT_MODEL_NAME=gemini-3-pro-preview # or compatible
IMAGE_MODEL_NAME=gemini-3-pro-image-preview # or specific imagen modelBuild the Docker image:
docker-compose buildRun the generator:
- Place your story text file in the
data/directory (e.g.,data/my_story.txt). - Execute the container:
Note: The
docker-compose run app --text-file data/my_story.txt --output-dir output/my_project_name
outputdirectory will be populated with the results on your host machine.
Create a virtual environment:
python3 -m venv venv
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtRun the application:
python main.py --text-file data/my_story.txt --output-dir output/my_project_name--text-file: (Required) Path to the input text file containing the story.--output-dir: Directory to save generated assets and illustrations (default:output).--style-prompt: Optional prompt to guide the initial style detection (e.g., "Cyberpunk anime", "Oil painting").
The tool creates an organized, flat output directory:
output/
├── characters/ # Character assets
│ └── 1_character_name.jpeg
├── locations/ # Location assets
│ └── 1_location_name.jpeg
├── illustrations/ # Final Scene Illustrations
│ └── 1_sunny_park_scene.jpeg
├── data.json # Unified manifest (Style, Characters, Locations, Illustrations)
└── style_templates/ # Generated style base images
├── bg_fullbody.jpg # 16:9 solid background for characters
├── style_reference_fullbody.jpg # 16:9 character style reference
└── bg_location_16_9.jpg # 16:9 neutral background for locations
The data.json file serves as the central manifest for the project.
{
"style_prompt": "Description of the visual style...",
"characters": [
{
"id": 1,
"name": "Character Name",
"original_name": "Original Name from Text",
"description": "Visual description...",
"full_body_path": "output/characters/1_character_name.jpeg",
"generation_prompt": "Full generation prompt used..."
}
],
"locations": [
{
"id": 1,
"name": "Location Name",
"original_name": "Original Name from Text",
"description": "Visual description...",
"reference_image_path": "output/locations/1_location_name.jpeg",
"generation_prompt": "Full generation prompt used..."
}
],
"illustrations": [
{
"scene_id": 1,
"story_segment": "Original text of the scene...",
"name": "sunny_park_scene",
"location": {
"id": 1,
"name": "Location Name"
},
"characters": [
{
"id": 1,
"name": "Character Name",
"full_body_path": "output/characters/1_character_name.jpeg"
}
],
"illustration_path": "output/illustrations/1_sunny_park_scene.jpeg",
"generation_prompt": "Full generation prompt used..."
}
]
}This project uses pytest for testing. The test suite covers models, configuration, asset management, and the AI client wrapper.
To run tests:
# Activate your virtual environment first
source venv/bin/activate
# Run all tests
pytest tests
# Run with verbose output
pytest -v testsThe tests use unittest.mock and pytest-mock to simulate Google GenAI API responses and filesystem operations, ensuring that tests are fast and do not consume API quota.
main.py: Entry point and orchestration logic.app/: Core package.config.py: Configuration and environment management.core/: Key logic modules.ai_client.py: Wrapper for Google GenAI SDK.analyzer.py: Story analysis (Scene/Character/Location extraction).asset_manager.py: Manages creation and cataloging of reference assets.illustrator.py: Generates the final scene illustrations.models.py: Pydantic data models.
tests/: Test suite.
This project is licensed under the MIT License - see the LICENSE file for details.