Skip to content

bilsimaging/soundsnapper

Repository files navigation

🎵 SoundSnapper: AI-Powered Reality Remix

Transform your camera captures into immersive audio-visual experiences using cutting-edge AI

Nano Banana Hackathon Gemini 2.5 Fal AI ElevenLabs

Transform your camera captures — AI banner


❓ The Problem

Creating engaging audio-visual content typically requires expensive software, technical skills, and hours of editing. Most people can't instantly transform everyday objects into creative, shareable experiences.


💡 Our Solution

SoundSnapper makes creativity one-tap simple:
📷 Snap → 🧠 Analyze → 🎨 Transform → 🎵 Generate → ✨ Share

A seamless fusion of reality and AI-powered imagination.


🌟 Key Features

  • 📸 Instant Camera Capture - Intuitive mobile-first interface
  • 🧠 AI Scene Intelligence - Gemini 2.5 Flash understands your photos
  • 🎨 Artistic Transformations - Anime, Cyberpunk, Watercolor & more
  • 🎵 Immersive Soundscapes - ElevenLabs generates matching audio
  • 🔊 Interactive Controls - Volume, zoom, and playback options
  • 📱 Responsive Design - Works perfectly on any device
  • No Setup Required - Try instantly without API keys

🎯 Who It's For

🎬 Content Creators - Turn mundane objects into viral TikTok moments
📚 Educators - Help kids discover the "sounds" of everyday items
🎶 Musicians - Find inspiration in unexpected visual-audio combinations
🏢 Brands - Create interactive campaigns with object-to-sound experiences


🚀 Real-World Examples

  • 📱 Social Media: Snap your coffee → Get cyberpunk visuals + café ambiance
  • 🎓 Education: Kids explore how different materials "sound" in their imagination
  • 🎵 Music Production: Random objects spark new ambient textures
  • 🛍️ Marketing: Product scans generate branded soundscapes

🎥 Live Demo

🌐 Try SoundSnapper Now (No Setup Required)

🎬 Watch Demo Video
SoundSnapper Demo


🔮 Roadmap

  • 📱 TikTok/Reels Export - Vertical video output with audio sync
  • 🎯 Multi-Object Mode - Layer multiple items for complex soundscapes
  • 🎭 Style Packs - Premium themes (Retro, Minimal, Sci-Fi)
  • 🗂️ Personal Gallery - Save and revisit your creations
  • 🌍 Community Hub - Share and remix with others
  • 🛡️ Privacy-First - Zero data retention, ephemeral processing

🛠️ Tech Stack

Frontend: React 19 + TypeScript + Vite
AI Vision: Google Gemini 2.5 Transformations: Fal AI (gemini-25-flash-image/edit)
Audio Generation: ElevenLabs API
UI/UX: Custom CSS with Glassmorphism
Deployment: Vercel + Serverless Functions


⚡ Quick Start

Prerequisites

Setup

# Clone & Install
git clone https://github.com/bilsimaging/soundsnapper.git
cd soundsnapper
npm install

# Configure Environment
cp .env.example .env.local
# Add your API keys to .env.local

# Launch
npm run dev
# Open http://localhost:5173

⚠️ Security Note: Use serverless functions to proxy API calls and protect keys.


🎮 How to Use

  1. 📷 Grant camera access when prompted
  2. 📸 Snap a photo of any object
  3. ⏳ Wait for AI magic (analysis + audio generation)
  4. 🎨 Choose your style (Anime, Cyberpunk, etc.)
  5. ✨ Apply transformation and enjoy the result
  6. 🔊 Adjust volume or zoom to view full-size
  7. 📤 Share your creation with the world

🏆 Competition Entry - Google Nano Banana Hackathon 2025 🍌

🎯 Judging Criteria Alignment

✨ Innovation & "Wow" Factor (40%)
SoundSnapper pioneers a new creative medium: instant reality-to-art transformation with synchronized soundscapes. This multi-modal AI pipeline (vision → transformation → audio) creates magical experiences impossible before Gemini 2.5 Flash.

⚙️ Technical Excellence (30%)
Modern React 19 architecture with TypeScript, secure serverless API proxying, mobile-optimized responsive design, and seamless integration of three AI services.

🌍 Real Impact (20%)
Democratizes creative content creation for millions - from TikTok creators to classroom teachers to music producers. Removes technical barriers to artistic expression.

🎥 Presentation Quality (10%)
Professional live demo, clear documentation, and engaging video showcase demonstrate the full potential.


🧠 Gemini 2.5 Flash Integration

Gemini 2.5 Flash Image ("nano banana" technology) is SoundSnapper's intelligent core, accessed via Fal AI's fal-ai/gemini-25-flash-image/edit endpoint.

Core Capabilities:

  • 🔍 Scene Understanding - Recognizes objects, materials, environments, and context
  • 🎨 Style Generation - Creates artistic transformations (Anime, Cyberpunk, Watercolor)
  • 🧠 Smart Context - Provides rich descriptions for audio generation

The Magic Flow:

  1. Photo captured → Gemini analyzes visual elements
  2. Gemini generates artistic style variants via Fal AI
  3. Scene understanding informs ElevenLabs audio creation
  4. Result: Perfectly matched visual + audio experience

Gemini 2.5 Flash is the "brain" that makes everything possible - understanding your photos and transforming them into creative art while providing context for matching soundscapes. Without nano banana technology, SoundSnapper couldn't bridge the gap between visual input and meaningful audio-visual output.


🤝 Contributing

While this is a hackathon project, contributions are welcome:

  • 🐛 Report bugs via GitHub Issues
  • 💡 Suggest features for future versions
  • Star the repo if you love the concept!

📄 License

MIT License

Copyright (c) 2025 Bilsimaging


🙏 Acknowledgments

  • Google for Gemini 2.5 Flash Image technology
  • Fal for providing seamless API access
  • ElevenLabs for revolutionary audio generation
  • Nano Banana Hackathon organizers for this amazing opportunity

Made with ❤️ by Bilsimaging for the Nano Banana Hackathon 2025 🍌

About

Transform your camera captures into immersive audio-visual experiences using cutting-edge AI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published