TL;DR: A Python script to translate text in images with inpainting by your favorite generative AI models (Stable Diffusion, Midjourney, DALL·E).
| Before | After (Hungarian translation) | With font detection (-fd) |
|---|---|---|
![]() |
![]() |
![]() |
InpaintTranslate runs text detection on your image, masks the text boxes, and in-paints the masked regions
until your image is text-free.
InpaintTranslate can be run entirely on your local machine using
- Tesseract or PaddleOCR for text detection and
- Stable Diffusion for in-painting
or can call existing APIs
You can translate text from your image in just a few lines:
import os
from inpaint_translate.text_detector import PaddleTextDetector
from inpaint_translate.inpainter import LocalSDInpainter
from inpaint_translate.inpaint_translator import InpaintTranslator
from inpaint_translate.font_detector import WhatFontIsAPI
text_detector = PaddleTextDetector()
inpainter = LocalSDInpainter()
translator = MyMemoryTranslator(source="en-US", target="hu-HU")
font_detector = WhatFontIsAPI(os.environ["WHATFONTIS_API_KEY"])
inpaint_translator = InpaintTranslator(text_detector, inpainter, translator, font_detector)
inpaint_translator.inpaint_translate("/my/input/image/path.png", "/my/output/image/path.png")or throught the handy run.py script
python run.py "/my/input/image/path.png" -o "/my/output/image/path.png"
Use verbose mode to create intermediary images in a "debug" folder by
-vflag inrun.pyor bylogginglibrary logger set to debug mode.
We provide multiple implementations for text detection and in-painting (both local and API-based), and you are also free to add your own.
TesseractTextDetector(based on Tesseract) runs locally. Follow this guide to install thetesseractlibrary locally. On Ubuntu:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
To find the path where it was installed (and pass it to the TesseractTextDetector constructor):
whereis tesseract
AzureTextDetectorcalls a computer vision API from Microsoft Azure. You will first need to create a Computer Vision resource via the Azure portal. Once created, take note of the endpoint and the key.
AZURE_CV_ENDPOINT = "https://your-endpoint.cognitiveservices.azure.com"
AZURE_CV_KEY = "your-azure-key"
text_detector = AzureTextDetector(AZURE_CV_ENDPOINT, AZURE_CV_KEY)Our evaluation shows that the two text detectors produce comparable results.
PaddleTextDetector(based on PaddleOCR) runs locally. Follow this guide to install thepaddlepaddlelibrary locally. Or just use
pip install -r requirements_paddleocr.txt
LocalSDInpainter(implemented via Huggingface'sdiffuserslibrary) runs locally and requires a GPU. Defaults to Stable Diffusion v2 for in-painting.ReplicateSDInpaintercalls the Replicate API. Defaults to Stable Diffusion v2 for in-painting (and requires an API key).DalleInpaintercalls the DALL·E 2 API from OpenAI (and requires an API key).
# You only need to instantiate one of the following:
local_inpainter = LocalSDInpainter()
replicate_inpainter = ReplicateSDInpainter("your-replicate-key")
dalle_inpainter = DalleInpainter("your-openai-key")Translations are provided with deep-translator.
Keep in mind that different providers have different supported languages and restrictions on usage
Font detection is provided by WhatFontIs
To use it you need to
- Create an account
- Request an API key
- Set environment variable
WHATFONTIS_API_KEYwith the value of your API key
Keep in mind that the free account provides 200 request per day
This project was based on detexify by Mihail Eric and Julia Turc.
Created by Dolers for his own amusement.



