A powerful AI-based system that converts text, image, and audio inputs into high-quality, structured prompts for generative AI models like Stable Diffusion, Midjourney, and DALL·E.
-
✍️ Text → Prompt Refines and extends simple prompts into detailed, high-quality prompts.
-
🖼️ Image + Text → Prompt Understands an image and user intent to generate a descriptive prompt.
-
🎧 Audio → Prompt Converts speech into text and then generates a refined prompt.
-
🧠 Multimodal AI (Janus-Pro-1B) Uses a vision-language model for intelligent prompt generation.
-
🎨 Gradio UI Interactive web interface for easy usage.
Input (Text / Image / Audio)
↓
Preprocessing Layer
(Whisper for audio)
↓
Instruction Builder (Prompt Engineering)
↓
Janus-Pro-1B Model
↓
Post-processing (clean output)
↓
Final AI Prompt
- Python
- HuggingFace Transformers
- DeepSeek Janus-Pro-1B
- OpenAI Whisper (Speech-to-Text)
- Gradio (UI)
- PyTorch
git clone https://github.com/your-username/prompt-generator.git
cd prompt-generatorpip install -r requirements.txtpython app.py-
Open the Gradio UI in your browser
-
Select input type:
- Text
- Image + Text
- Audio
-
Provide input
-
Click Generate Prompt 🚀
-
Get your refined AI prompt
boy in forest
A cinematic scene of a young boy standing in a dense forest, soft sunlight filtering through tall trees, atmospheric fog, ultra-detailed, 4k, depth of field, masterpiece
project/
│
├── app.py
├── requirements.txt
└── README.md
text_to_prompt()image_text_to_prompt()audio_to_prompt()generate_universal_prompt()
- Requires GPU for best performance
- Video input not supported (yet)
- Output quality depends on prompt instruction
- 🎥 Video input support
- 🎨 Style selection (anime, cinematic, realistic)
- 📊 Prompt scoring system
- ☁️ Deployment on HuggingFace Spaces
Pull requests are welcome! For major changes, please open an issue first.
This project is open-source under the MIT License.
Anshu Singh
Give it a ⭐ on GitHub!