Image_Generator_SD is a web application that allows you to generate or modify images using Stable Diffusion models through a user-friendly Gradio interface. This project combines three core functionalities:
----> Web-UI Acces
- Txt2Img: Generate images from scratch using text prompts.
- Img2Img: Transform an existing image guided by text prompts.
- Inpainting: Modify or fill specific regions of an image using masks and text prompts.
Below is a personal demonstration video showcasing how to use and interact with the application:
Gradio is an open-source Python library that enables developers to create customizable web interfaces for machine learning models with minimal code. Founded in 2019 by Abubakar Abid and colleagues, Gradio was designed to make machine learning models more accessible and interactive for users without requiring specialized software or expertise.
In 2021, Gradio was acquired by Hugging Face.
Gradio interfaces are practical for developers and data scientists during the development and testing phases and are also highly valuable for showcasing models to stakeholders, clients, or users. By providing an interactive and user-friendly interface, Gradio allows non-technical users to quickly understand and interact with the underlying machine learning models, fostering collaboration and feedback.
Stable Diffusion is a deep learning text-to-image model released in 2022, leveraging diffusion technology and latent space for efficient processing. This significantly reduces hardware requirements, allowing it to run on consumer GPUs.
This model has inspired major open-source projects like ControlNet, which enables fine control over image generation using depth maps, pose estimation, and edge detection. ComfyUI offers a node-based, visual workflow to build complex Stable Diffusion pipelines without coding. AnimateDiff brings AI-driven animations to life by applying Stable Diffusion consistently across frames.
This project is organized around three major classes to handle different image generation workflows: Txt2Img, Img2Img, and Inpainting. Each class is located in model.py and is utilized within different Gradio interfaces in app.py.
Code Explanation
- Location:
class Txt2Imginmodel.py - Purpose: Generate images from textual prompts (positive and optionally negative).
- Key Steps:
- Load a StableDiffusionPipeline (default:
"CompVis/stable-diffusion-v1-4"). - Move the pipeline to the available device (CUDA if available).
- Call
txt2img()with the following parameters:pos_prompt(required)neg_prompt(optional negative prompt)guidancescalesteps(number of inference steps)widthandheight(image dimensions)
- Load a StableDiffusionPipeline (default:
In app.py, the function generate_img_from_txt(...) orchestrates this process and returns the generated image to the user interface.
Code Explanation
- Location:
class Img2Imginmodel.py - Purpose: Transform an existing image based on new prompts and parameters.
- Key Steps:
- Load a StableDiffusionImg2ImgPipeline (default:
"runwayml/stable-diffusion-v1-5"). - Resize the input image if necessary (via
resize_image()inimageProcess.py). - Call
img2img()with parameters:image(the original image file path)pos_prompt&neg_promptstrength(how strongly the prompt influences the final image)guidancescalesteps(number of inference steps)
- Load a StableDiffusionImg2ImgPipeline (default:
In app.py, the function generate_img_from_img(...) is triggered upon user interaction in the Img2Img tab.
Code Explanation
- Location:
class Inpaintinginmodel.py - Purpose: Fill or modify specific regions of an image using a mask.
- Key Steps:
- Load a StableDiffusionInpaintPipeline (default:
"runwayml/stable-diffusion-inpainting"). - Convert and prepare the mask from the user’s edits. This involves converting the alpha channel to a binary mask.
- Call
inpainting()with parameters:image(the original image)mask(the area to modify)pos_prompt&neg_promptguidancescalesteps(inference steps)strength(influence of the prompt on changes)
- Load a StableDiffusionInpaintPipeline (default:
In app.py, the function generate_image_from_paint(...) processes the edited image and mask from the Gradio ImageEditor component, then performs the inpainting.
To run this application on a Linux or Windows environment, follow these steps:
git clone https://github.com/YourUsername/Image_Generator_SD.git
cd Image_Generator_SD
./setup-linux.shgit clone https://github.com/YourUsername/Image_Generator_SD.git
cd Image_Generator_SD
.\setup-windows.batOnce installed, open your browser and go to http://127.0.0.1:7860
