Media Processing REST API

Overview

This project implements a REST API service for generating media combinations. The service receives JSON input containing video blocks, audio blocks, and text-to-speech instructions, then generates all possible combinations of videos with randomized audio and TTS overlays. The final media outputs are stored in Google Drive

Features

POST /process_media endpoint:
- Accepts JSON payload with video blocks, audio blocks, and text-to-speech instructions
- Queues tasks asynchronously; does not block new requests while processing
- Logs the execution time of each process, at the end logs the total time and the number of successful/unsuccessful executions of all processes
Generates all possible combinations of videos across blocks
Randomly applies:
- Background audio (looped and volume normalized)
- Text-to-speech voice overlays (using ElevenLabs.io)
Saves generated .mp4 files in a folder named after task_name in GCS/Google Drive
Structured logging for observability
Validation of input using Pydantic
Metrics include:
- Time to generate each video
- Total task execution time

Prerequisites and Installation

Prerequisites

Docker
Google Drive Client credentials
ElevenLabs API key (for text-to-speech)

Installation

Clone repository: git clone https://github.com/MasterpieceElbow/renesandro-test-task.git
Install virual environment (Python3.11 should be already installed on your machine): python3.11 -m venv venv
Activate virtual environment (for MacOS/Linux): source venv/bin/activate
Install requirements: pip3.11 install -r requirements.txt
Copy .env.example into .env and fill it with your parameters
Build docker image: docker-compose build
Run docker-compose services: docker-compose up
Access Swagger: http://127.0.0.1:8000/docs

API Usage

Endpoint: POST /process_media

Request Body Example:

{
  "task_name": "test_task_2blocks_with_audio",
  "video_blocks": {
    "block1": ["https://storage.googleapis.com/video1.mp4", "https://storage.googleapis.com/video2.mp4"],
    "block2": ["https://storage.googleapis.com/video3.mp4", "https://storage.googleapis.com/video4.mp4"]
  },
  "audio_blocks": {
    "audio1": ["https://storage.googleapis.com/audio1.mp3", "https://storage.googleapis.com/audio2.mp3"]
  },
  "text_to_speach": [
    {"text": "Hello world", "voice": "Sarah"}
  ]
}

Behavior:

Generates all combinations of videos across blocks
Randomly adds background audio and TTS overlay
Create Celery task for each video combination and execute them across all Celery workers
Saves results in task_name/ folder in Google Drive

Tools & Technologies

Python 3.11

FastAPI

Celery + Redis (for task queue)

Moviepy (video/audio processing)

Pydantic (data validation)

Docker & Docker Compose

Google Drive API

ElevenLabs API (text-to-speech)

Lessons Learned / New Tools

ElevenLabs TTS API – integrated programmatically for dynamic voice overlays

Moviepy Python integration – work with audio of the video

Celery Chord – callback task when all tasks are done

Bottlenecks

The number of videos processed in parallel depends on the number of celery workers. Celery workers can be scaled horizontally to process multiple tasks concurrently
Video download & encoding (Moviepy CPU intensive). Optimization: Parallel downloading, caching, or using faster storage
TTS generation (network latency with ElevenLabs API). Optimization: Batch requests where possible

Future Improvements

Implement caching for downloaded video/audio to reduce redundant downloads
implement caching for TTS from ElevenLabs since some voiceovers are used multiple times across video combinations. This will reduce token usage
Add retry mechanism for failed TTS or media processing
Expose task status endpoint for progress tracking
Add more metrics to monitor queue performance and worker utilization
Сurrently this only works for videos of the same format and resolution. Add processing of different formats: .mp4 + .mov and different resolutions
Consider another tools for video processing. Probably, plain FFmpeg may give better performance.
On a low performance VPS it may drop frames. Need to add correct processing considering the lightweight servers
Add tests
Consider using taskiq instead of Celery, as taskiq is asynchronous
Spin up LGTM container and send metrics there and visualize

Author

Oleksii Proshchenko - Python Engineer

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
main		main
routers		routers
services		services
tasks		tasks
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Media Processing REST API

Overview

Features

Prerequisites and Installation

Prerequisites

Installation

API Usage

Tools & Technologies

Lessons Learned / New Tools

Bottlenecks

Future Improvements

Author

About

Uh oh!

Releases

Packages

Languages

MasterpieceElbow/renesandro-test-task

Folders and files

Latest commit

History

Repository files navigation

Media Processing REST API

Overview

Features

Prerequisites and Installation

Prerequisites

Installation

API Usage

Tools & Technologies

Lessons Learned / New Tools

Bottlenecks

Future Improvements

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages