Transform any photo into an animated talking cartoon with synchronized audio!
- 🎨 Cartoonification - Convert photos to cartoon style using advanced OpenCV filters
- 🎬 Animation - Animate cartoons using First Order Motion Model
- 🔊 Audio Sync - Automatically sync audio from reference video
- 🌐 Web Interface - Easy-to-use browser interface
- ⚡ GPU Accelerated - Fast processing with CUDA support
pip install -r requirements.txtDownload the VoxCeleb checkpoint:
- URL: https://drive.google.com/open?id=1PyQJmkdCsAkOYwUyaj_l-l0as-iLDgeH
- Save as:
checkpoints/vox-cpk.pth.tar
Or use the download script:
python download_checkpoint.pyWeb Interface (Easiest):
python app.py
# Open http://localhost:5000Command Line:
python cartoon_animator.py photo.jpg talking_video.mp4 output.mp4- Cartoonify - Your uploaded image is converted to cartoon style
- Detect Keypoints - AI detects facial/body keypoints in both cartoon and video
- Generate Motion - Motion from video is transferred to cartoon
- Sync Audio - Audio from reference video is added to final output
- Start the server:
python app.py - Open http://localhost:5000
- Upload your image and reference video
- Choose cartoon style
- Click "Create Animated Cartoon"
- Download your result!
python cartoon_animator.py <source_image> <driving_video> <output_video>Example:
python cartoon_animator.py selfie.jpg talking.mp4 animated_selfie.mp4- Advanced (Recommended) - Best quality with color quantization
- Bilateral - Smooth colors with sharp edges
- Pencil - Pencil sketch effect
- Stylization - Artistic stylization
python cartoon_animator.py portrait.jpg speech_video.mp4 talking_portrait.mp4Input: Portrait photo + Video of someone talking Output: Cartoon portrait that talks with synced audio
python cartoon_animator.py character.png singing_video.mp4 singing_character.mp4Input: Character image + Singing video Output: Animated singing cartoon
- Bilateral filtering for color smoothing
- Adaptive thresholding for edge detection
- Color quantization (K-means clustering)
- HSV enhancement for vibrant colors
- First Order Motion Model
- 10 keypoint detection
- Dense motion field generation
- Occlusion-aware rendering
- Automatic audio extraction from reference video
- Frame-perfect synchronization
- AAC audio codec
| Hardware | Speed | Time for 30s video |
|---|---|---|
| RTX 3080 | ~30 FPS | ~1 minute |
| RTX 2060 | ~20 FPS | ~1.5 minutes |
| CPU (i7) | ~3 FPS | ~10 minutes |
- Python 3.7+
- PyTorch 1.8+
- OpenCV 4.5+
- 4-6 GB RAM
- 2-3 GB GPU memory (optional)
midnightproject/
├── cartoonify.py # Cartoonification module
├── cartoon_animator.py # Main pipeline
├── app.py # Web interface
├── modules/ # Animation models
│ ├── util.py
│ ├── keypoint_detector.py
│ ├── dense_motion.py
│ └── generator.py
├── config/ # Model configuration
│ └── vox-256.yaml
├── checkpoints/ # Pre-trained models
└── requirements.txt # Dependencies
Based on OpenCV techniques from:
Based on First Order Motion Model:
- Paper: "First Order Motion Model for Image Animation" (NeurIPS 2019)
- Authors: Aliaksandr Siarohin et al.
- Repository: https://github.com/AliaksandrSiarohin/first-order-model
pip install torch torchvisionDownload the checkpoint file and place it in checkpoints/vox-cpk.pth.tar
The system will automatically fall back to CPU if GPU memory is insufficient
Try different cartoon styles (advanced, bilateral, pencil, stylization)
Ensure the reference video has audio and is in a supported format (MP4, AVI)
For educational and research purposes. See original repositories for licensing details.
Create amazing animated cartoons and share them with the world! 🎭✨