Pisces is a video advertisement generator using Cohere (chat API), Hugging Face (video generation), Google (text to speech) and a background music generation model that harnesses the power of Recurrent Neural Networks (RNNs) specifically Long Short-Term Memory (LSTM) units. This model generates soothing background music tailored to text prompts and advertisements by learning from a vast database of MIDI files.
- User Input: The user provides a text prompt describing the desired advertisement. This could include product details, target audience, desired tone, and other relevant information.
- Cohere Chat API: The text prompt is processed using Cohere's chat API, which generates a coherent and engaging ad script based on the provided input.
- Video Generation: The generated ad script is then fed into Hugging Face's video generation model. This model creates a visual representation of the script, producing a video that aligns with the described advertisement.
- Voice Over: Google’s text-to-speech API is used to convert the ad script into natural-sounding speech. This voiceover is synchronized with the video to provide an engaging auditory experience.
- Background Music Generation RNNs (LSTMs): The background music generation model is trained on a vast database of MIDI files. It utilizes Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) units, which are effective in capturing temporal dependencies in music sequences. Given the ad script, the model generates soothing background music that complements the tone and mood of the advertisement. The LSTM units ensure the music flows naturally and fits well with the video's narrative.
- Final Output: The generated video, synchronized voiceover, and background music are integrated to produce the final advertisement. The result is a cohesive video ad with engaging visuals, clear narration, and soothing background music tailored to the content.
output.mp4
celeste_piano.mp4
music_hype.mp4
The architecture of Pisces draws inspiration from the advancements in LSTM-based music generation models.
To utilize Pisces and generate background music from text prompts, follow these steps:
Inside the repository, you will find:
- Model Definition: Details regarding the architecture of Audiogenesis, including LSTM units and text prompt embedding techniques.
- Training Scripts: Scripts and code for training the Audiogenesis model using your own MIDI dataset or pre-existing data.
- Inference Code: Code for generating background music using the trained model. You can input your text prompts to receive musical compositions.
- Evaluation Techniques: Techniques for evaluating the quality of generated music, ensuring coherence and relevance to the given text prompts.
The repository includes MIDI files alongside corresponding text prompts for training and evaluation purposes.