🎿 Image-to-Audio Story Converter

🔗 Live Demo: genai-project-6b5bre75bfmupqpa8npwz5.streamlit.app

This project converts images into engaging audio stories using image captioning, text generation, and browser-based speech synthesis.

📌 Overview

The app performs the following steps:

🖼️ Image-to-Text: Captions images using Salesforce/blip-image-captioning-base.
📜 Text-to-Story: Expands captions into stories using GPT-2.
🔊 Text-to-Speech: Converts stories into audio using browser's built-in SpeechSynthesis API.

🧠 Technologies Used

Python: Core backend processing
Streamlit: Frontend UI framework
Hugging Face Transformers: Access to GPT-2 and image models
Hugging Face Inference API: For accessing models like BLIP and GPT-2
BLIP (Salesforce): Image captioning model
Browser SpeechSynthesis: In-browser TTS using JavaScript (no external TTS API needed)

🚀 Setup & Usage

1. Clone the repository:

git clone https://github.com/fahad10inb/GenAI-Project.git
cd GenAI-Project

2. Install dependencies:

pip install -r requirements.txt

3. Run the app:

streamlit run app.py

4. Use the UI:

Upload an image.
View the AI-generated caption.
Generate a story from the caption.
Listen to the story using browser-based audio playback.

🔊 Text-to-Speech (Browser-based)

No need for external models or installations — audio is generated using the browser’s built-in SpeechSynthesis API.

💡 Works out of the box on Chrome, Edge, and Firefox with natural voices.

🌱 Future Enhancements

Improve caption-to-story creativity with fine-tuned LLMs.
Add multilingual support for narration.
Allow custom voice selection and speech rate control.
Optional export of audio to downloadable .wav using ESPnet locally.

👥 Contributors

GitHub: Fahad10inb
Email: fahadrahiman10@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.devcontainer		.devcontainer
.env		.env
README.md		README.md
Testcase-1.jpg		Testcase-1.jpg
Testcase-2.jpg		Testcase-2.jpg
app.py		app.py
audio.flac		audio.flac
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎿 Image-to-Audio Story Converter

📌 Overview

🧠 Technologies Used

🚀 Setup & Usage

1. Clone the repository:

2. Install dependencies:

3. Run the app:

4. Use the UI:

🔊 Text-to-Speech (Browser-based)

🌱 Future Enhancements

👥 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎿 Image-to-Audio Story Converter

📌 Overview

🧠 Technologies Used

🚀 Setup & Usage

1. Clone the repository:

2. Install dependencies:

3. Run the app:

4. Use the UI:

🔊 Text-to-Speech (Browser-based)

🌱 Future Enhancements

👥 Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages