Paraphrased from this source: https://www.datadriveninvestor.com/2023/08/09/musicgens-parameters-and-what-they-mean/
Prompts:
- for tempo, include BPM
- for audio quality, include “kbps” (kilobits per second) or “kHz” (kilohertz). "Higher values for both kbps and kHz ensure superior recording quality, reducing unwanted background noise and expanding the sound range. MusicGen’s default settings include 320kbps and 48kHz, which are relatively high for MP3 recordings, ensuring a rich and clear sound. However, it is crucial to consider the genre and style of music when choosing audio quality; for instance, some song intentionally incorporates lower audio quality (e.g., 64kbps and 16kHz) to achieve a specific vintage or nostalgic vibe."
- include time signature
Top-k: "Top-k is a crucial parameter in text and music generation models, including MusicGen. It controls the number of most probable next tokens considered during the generation process. The model ranks all potential tokens based on their predicted probabilities and selects the top-k tokens from this list. A smaller value of k results in a more focused and deterministic output, while a larger k value allows for greater diversity in the generated music. Adjusting the top-k parameter enables users to fine-tune the balance between repetition and creativity in the music generated by MusicGen."
- James H research: define token in the context of audio (sample of 'n' ms?)
Top-p (Nucleus Sampling): "Top-p, also known as nucleus sampling or probabilistic sampling, is another important method used during text and music generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (often denoted as p). This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for varying the number of tokens considered based on their probabilities. Implementing top-p in MusicGen allows for more controlled and nuanced music generation."
Temperature: "The temperature parameter is a key factor in controlling the randomness and creativity of the generated music. During the sampling process, a higher temperature value results in more random and diverse outputs, introducing variability and unpredictability to the music generated by MusicGen. Conversely, a lower temperature value produces more focused and deterministic outputs, potentially resulting in repetitive but structured compositions. Adjusting the temperature parameter allows users to tailor the level of creativity and coherence they desire in the generated music."
Classifier-Free Guidance: "Classifier-Free Guidance is an advanced technique used in some music generation models, including MusicGen. It involves training a separate classifier network on labeled data to recognize specific musical characteristics or styles.
During the generation process, the output of the MusicGen model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach empowers users with more precise control over the generated music, enabling them to specify certain attributes they want MusicGen to capture. The incorporation of Classifier-Free Guidance enhances the versatility and adaptability of the music generation process."
- James H research: how to train a classifier network; data sources? how much data needed? can this be done with 10 labeled samples?
"With just 9-10 tracks, you can fine-tune MusicGen to emulate your chosen musical style. Ensure each track exceeds 30 seconds, and the training script will seamlessly handle the rest, automatically dividing lengthy audio files into 30-second chunks."
- possibly this: https://musicgenai.org/musicgen-fine-tune/
- possibly this: https://github.com/chavinlo/musicgen_trainer (broken and no longer maintained)
Model explainer: https://musicgenai.org/musicgen-models/
- what is a transformer?
- what are codebooks?
- define overfitting
MusicGen, meta's txt2music model: https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md
how to local install MusicGen on MAC: https://medium.com/@woyera/how-to-install-and-run-facebook-audiocrafts-musicgen-locally-297f053a4fdc
guide to MusicGen in colab: https://vidyabhandary.github.io/blog/gpt,/ai,/text-to-music/prompts/2023/06/16/ComposerQuill.html
what was MusicGen trained on?
- https://music3point0.com/2023/06/14/metas-musicgen-trained-on-20000-hours-of-licensed-music-but-is-it-any-good/
- and another article: https://musically.com/2023/06/12/metas-new-musicgen-ai-was-trained-on-20k-hours-of-licensed-music/
- and the source is paper from the creators of MusicGen: https://arxiv.org/pdf/2306.05284
MusicGen page on HuggingFace: https://huggingface.co/facebook/musicgen-large and the GitHub repo is here: https://github.com/facebookresearch/audiocraft
Competitor to MusicGen is MusicLM: https://musiclm.com/ MusicLM on GitHub: https://google-research.github.io/seanet/musiclm/examples/ Info on the training data: https://www.kaggle.com/datasets/googleai/musiccaps test implementation: https://aitestkitchen.withgoogle.com/tools/music-fx
Holly Herndon and Matt Dryhurst reviewed on eflux: https://www.e-flux.com/criticism/641034/holly-herndon-and-mat-dryhurst-s-the-call
interdependence podcast from Herndon and Dryhurst: https://interdependence.fm/